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ABSTRACT 

An  adaptive  recursive  digital  filter  is  presented  in 
which  feedback  and  feedforward  gains  are  adjusted  adaptively 
to  minimize  a  least  square  performance  function  on  a  sliding  win 
dow  averaging  process.   A  two-dimensional  version  of  the  adap- 
tive filter  is  developed  and  its  performance  compared  with 
the  optimal  Wiener  filter.   The  filter  is  shown  to  be  effec- 
tive in  separating  three  diagonal  trajectory  streaks  from  a 
background  of  correlated  noise  added  to  ivhite  noise.   Although 
the  recursive  adaptive  filter  approaches  the  optimal  Wiener 
filter  in  performance,  it  does  not  require  a  priori  statistical 
knowledge  as  does  the  Wiener  filter  to  which  it  is  compared. 
The  results  indicate  that  the  recursive  adaptive  filter  "learns" 
the  statistics  and  adapts. 
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I.   INTRODUCTION 

The  term  "filter"  is  often  applied  to  any  device  or 
system  that  processes  incoming  signals  or  other  data  in  such  a 
way  as  to  eliminate  noise,  to  smooth  signals,  to  identify  each 
signal  as  belonging  to  a  particular  class,  or  to  predict  the 
input  signal  from  instant  to  instant.   There  is  an  abundance 
of  literature  covering  the  theories  involved  under  the  head- 
ings of  estimation,  identification,  modeling,  prediction,  etc. 
The  usual  method  of  estimating  a  signal  corrupted  by  noise  is 
to  pass  it  through  a  filter  that  tends  to  suppress  the  noise 
while  leaving  the  signal  relatively  unchanged.   The  design  of 
such  filters  falls  in  the  domain  of  optimal  filtering,  which 
originated  with  the  pioneering  work  of  Wiener  [8]  and  was  ex- 
tended and  enhanced  by  the  work  of  Kalman-Bucy  [9]  and  others. 

Filters  used  for  the  above  purpose  can  be  fixed  or  adap- 
tive.  The  design  of  a  fixed  filter  is  based  on  a  priori  know- 
ledge of  both  signal  and  noise  statistics.   On  the  other  hand, 
adaptive  filters  have  the  ability  to  adjust  their  own  para- 
meters automatically,  and  their  design  requires  little  or 
no  a  priori  knowledge  of  signal  or  noise  characteristics. 
This  work  presents  an  approach  to  signal  filtering  using  an 
adaptive  filter  that  is  in  some  sense     self -designing. 
The  adaptive  filter  described  here  bases  its  own  "design"  (its 
adjustment  of  internal  parameters)    upon  estimated  (measured)  sta- 
tistical characteristics  of  input  and  output  signals. 


The  statistics  are  not  measured  explicitly  and  then  used 
to  design  the  filter;  rather,  the  filter  design  is  accomplished 
in  a  single  process  by  a  recursive  algorithm  that  automatically 
updates  the  system  parameters  with  the  arrival  of  each  new  data 
sample.   It  is  assumed  that  the  input  and  the  output  at  the 
sampling  instants  are  the  only  measurable  quantities  of  the 
system.   It  is  also  assumed  that  the  unknown  filter  coefficients 
parameters)  to  be  designed  enter  linearly  in  the  difference 
equations  which  describes  the  self -designing  process. 

The  steepest  descent  method  is  employed  in  which  the  pre- 
vailing filter  parameter  vectors  are  perturbed  at  each  itera- 
tion in  a  manner  so  as  to  decrease  a  prescribed  functional 
(error  criterion  or  cost  function)  to  be  minimized.   The  steep- 
est descent  method  is  one  of  the  well  known  gradient  based 
algorithms. 

For  the  case  where  the  functional  being  minimized  is  the 
mean  square  error,  where  the  error  is  the  difference  between 
filter  output  signal  and  the  desired  signal,  the  filter  is 
called  the  least  mean  square  filter  (LMS  filter) .   Various 
adaptive  algorithms  are  currently  available  depending  upon 
the  cost  function  and  the  method  used  to  minimize  cost  function. 

The  popularly  used  performance  criteria  are  the  least  mean 
square  criterion,  the  maximum  likelihood  ratio  (MLR)  criterion, 
and  the  maximum  signal  to  noise  ratio  (SNR)  criterion.   Here 
the  LMS  criterion  only  is  studied  and  the  steepest  descent 


method  is  employed.   Inevitable  errors  in  the  estimation  of 
the  statistics  prevent  the  adaptive  filter  from  delivering 
optimal  performance  since  the  adaptive  filter  is  not  based  on 
the  a  priori  knowledge  of  statistics.   In  Chapter  II,  the  con- 
cept of  linear  stochastic  processes  is  reviewed  as  a  preliminary 
study  for  this  thesis,  and  the  modeling  of  stochastic  processes 
is  studied.   These  can  be  considered  as  background  material 
for  the  following  chapters. 

In  Chapter  III,  the  concept  of  adaptive  filters  is  intro- 
duced and  the  structure  of  the  signal  and  the  mathematical 
model  of  the  processor  is  delivered.   The  algorithm  for  the  non- 
recursive  adaptive  filter  by  Widrow  [1]  is  reviewed  and  the  new 
algorithm  for  the  recursive  adaptive  filter  is  developed  as  is 
the  two-dimensional  adaptive  filtering  process. 

In  Chapter  IV,  the  adaptive  noise  cancelling  concept  is 
analyzed  rather  qualitatively  and  its  application  to  the  special 
case  in  which  no  desired  signal  is  available  is  analyzed.   In 
Chapter  V  an  experiment  is  performed  through  computer  simulation 
to  check  the  feasibility  of  algorithms  developed  in  the  pre- 
vious chapter  and  a  comparison  with  the  optimal  Wiener  solution 
is  made.   In  Chapter  VI,  the  conclusions  are  presented  together 
with  a  summary  of  the  experimental  results  and  suggestions  for 
further  research. 


II.   LINEAR  STOCHASTIC  PROCESSES 

A.   INTRODUCTION 

The  problem  of  defining  a  random  process  is  of  considerable 
importance  in  the  analysis  of  systems  subject  to  noise  distur- 
bance.  Often  a  partial  definition  of  the  process  will  suffice 
as  in  the  case  of  linear  least  mean  square  error  filtering, 
where  only  a  knowledge  of  the  correlation  function  is  required. 
For  other  problems,  such  as  those  involving  nonlinear  filtering, 
more  complete  information  will  generally  be  needed.   A  complete 
description  of  a  random  process  requires  a  knowledge  of  the 
distribution  functions  of  all  orders.   But  in  practice  few 
processes  apart  from  the  normal  and  Markov  are  defined  in  this 
manner.   For  the  purpose  of  analysis,  a  model  to  generate  the 
random  process  is  desirable  and  for  a  model  to  give  a  complete 
description  of  the  process,  the  distribution  functions  should  be 
derivable  from  the  model.   While  both  continuous  and  discrete- 
time  linear  process  may  be  defined,  only  the  discrete -time  case 
will  be  considered  here.   The  discrete -time  linear  process 
can  be  considered  to  be  the  result  of  the  digital  filtering  of 
a  sequence  of  independent  and  identically  distributed  (IID) 
random  variables. 

The  linear  processes  are  important  since  they  are  inherently 
simple  in  terms  of  physical  considerations  and  form  a  class 
which  includes  many  discrete  time  normal  random  processes. 
In  the  following  section  the  definition  and  properties  of  the 
linear  processes  are  summarized. 


B.   DEFINITION  AND  PROPERTIES  OF  LINEAR  STOCHASTIC  PROCESSES 

It  has  been  found  useful  in  the  theory  of  stochastic  processes 
to  divide   stochastic  processes  into  two  broad  classes:   sta- 
tionary and  nonstationary .   Intuitively, a  stationary  process 
is  one  whose  distribution  remains  the  same  as  time  progresses, 
because  the  random  mechanism  producing  the  process  is  not  chang- 
ing as  time  progresses.   A  nonstationary  process  is  one  which 
is  not  stationary. 

Let  {x(i),  ieT}  be  a  stochastic  process  with  finite  second 
moments.  Its  mean  value  sequence  ,  denoted  by  m(i) ,  is  defined 
for  all  i  in  T  by 

m(i)  =  E  [x(i)]  (2-1) 

and  its  covariance  kernel,  denoted  by  K  (j.i),  is  defined 
for  all  j  and  i  in  T  by 

K  (j.i)  =  Cov  [  x  (j),  x  (i)]  (2-2) 

An  index  set  T  is  said  to  be  a  linear  index  set  if  it  has  the 
property  that  the  sum  i+-j  of  any  two  numbers  i  and  j  of  T  also 
belongs  to  T.   Examples  of  such  index  sets  are  T=  {1,2  ,  .  .  .}, 
T  =  {o,±l,  ±2,  ....},  T={i;i>o}  and  T={i:-°°  <  i  <  »  } 
A  stochastic  process  (x(i),  ieT  },  whose  index  set  T  is  linear, 
is  said  to  be 

i)  strictly  stationary  of  order  k,  wherek  is  a  given  positive 
integer,  if  any  k  points  i,  i+1,  . . . i  +  k  in  T+,  where 
T+  -  (x  (i)  ,  i>_o},  and  any  j  in  T+,  the  k  dimensional  random 
vectors  (x(i) ,  x  (i+1).  .  x  (i+k) }and{  x(i+j),.  .  .  x(i+j+k)} 
are  identically  distributed. 
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ii)   strictly  stationary  if  for  any  integer  k,  it  is 
strictly  stationary  of  order  k. 

iii)   widesense  stationary  (covariance  stationary)  if  it 
possesses  finite  second  moments,  if  its  index  set  T  is  linear, 
and  if  its  covariance  kernel  K  (j  ,  i)  is  a  function  only  of  the 
absolute  difference  |j-i|,  in  the  sense  that  there  exists  a 
function  R(n)  such  that  for  all  j  and  i  in  T 

K(j,i)  =  Cov[x(j),  x(i)]  =  Rxx(j-i)         (2-3) 
or  more  precisely,  Rj,,,^)  has  the  property  for  every  i  and  j  in  Z+ 

Cov  [x(i) ,x (i+m) ]  =  E [x (i) x (i+m) ] 

=  Rxx(m)  (2-4) 

We  call  RxX(m)  the  covariance  function  (autocorrelation  function) 
of  wide  sense  stationary  time  series  (x(i),  ieT+}  . 

The  second  problem  concerns  the  concepts  of  ergodicity 
and  the  strong  law  of  large  numbers  in  terms  of  linear  processes 
To  present  a  complete  discussion  of  this  question  is  not  rea- 
sonable for     review  purposes,  but  it  is  interesting  to  con- 
sider certain  aspects  of  it.   For  strict  sense  stationary  pro- 
cesses, the  ergodic  theorem  is  the  strong  law  of  large  numbers 
and  states  that 

if{x(i),ieT  }  is  a  strict  sense  stationary,  ergodic 

random  process  and  E  [  |x  (o)  |  ]  <°°, 

m 
then  l^m  _J_  Z   x  (i)  =E  [x  (o)  ]  with  probability  1.     (2-5) 
m  i=l 


12 


In  general,  a  stochastic  process  is  said  to  be  ergodic  if  it 
has  the  property  that  sample  (or  time)  averages  formed  from 
an  observed  sequence  of  the  process  may  be  used  as  an  approxima- 
tion to  the  corresponding  ensemble  average. 

Stationarity  and  ergodicity  concepts  are  readily  extended  to 
two-dimensional  random  fields  (for  two  dimensional  signal,  the 
term  random  field  is  preferred  to  random  process) . 

The  assumption  that  the  field  is  stationary  means  that 
the  statistics  of  a  point  in  the  field  are  not  dependent  on  the 
location  of  the  point.   Then,      a  stationary,  two-dimensional 
field  has  an  autocorrelation  function  defined  as: 

Rxx(m,n)A  E{x(k,£)x(k+m,£+n) }  (2-6) 

and  it  is  also  said  to  be  ergodic  if  the  statistical  (ensemble) 
average  of  random  field  x(k,l)  is  equal  to  the  spatial  averag- 
ing of  all  points.   That  means 

E[x(k,i)]  =  <x>  (2-7) 

where  <x>  by  definition  represents  spatial  averaging 

<x>=  lim   -i-  _LMiMi        . 

Mi-00   m     m  ^  5        '  (2-8) 

M      Mi     M2o  o 

2-M» 

Now,  consider  a  stationary  sequence  of  random  variables 
{x(i) ,  ieT}.   The  correlation  function  of  the  sequence  may 
be  written  in  the  form 

I^x(n^      e  1Wn  dF(w)  (2-9) 

L 
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where  F (w)  is  a  nondecreasing  function,  called  the  spectral 
distribution.  If  F(w)  is  absolutely  continuous  with  F  (w)  = 
f (w)  almost  everywhere,  we  may  write 


it   iwn 

-7T 


Rxx(n)  =  |     e     f(w)  dw7  (2-10) 


Under  certain  conditions,  (e.g.    £  |R(n)|<<»,  finite),  the 

n=o 
correlation  function  may  be  inverted  to  yield  the  spectral 

density  as 

f(w)=  _1_     i    Rxx(n)e~inw  (2-11) 

2tt    -°° 

A  sequence  of  random  variables(x (i) ,  ieT}  is  said  to  be  a 
process  of  moving  average  if  it  admits  the  mean  square  repre- 
sentation 


x  (i)  =_E   h(i-j)  u(j)  (2-12) 

where(u  (j),  jeT}   is  a  collection  of  orthornomal  random 
variables  (sequence  of  white  noise) .   If  the  sequence  (h(i) , 
ieT}  is  one  sided  (ie,  h(i)  =  0,  i<0),  then  {x(i),  ieT}  is 
called  a  one  sided  moving  average  process. 

A  process  is  said  to  be  regular  if  the  error  for  prediction 
one  time  unit  ahead  is  nonzero.   It  is  known  that  a  process  is 
one  of  moving  averages  if  and  only  if  its  spectral  distribution 
function  is  absolutely  continuous.   Furthermore,  a  process  with 
an  absolutely  continuous  spectral  distribution  function  is 
regular  if  its  moving  average  representation  is  one  sided.   These 
facts  serve  to  motivate  the  following  definition  of  a  linear 
process . 

l  A 


DEFINITION  [5] 

A  linear  process  {x(i)/ieT}  is  one  having  the  structure 

j 
x(j)  =  Eh(i)u(j-i)  =  Zh(j-i)u(i)  ,        (2-13) 

o  -°° 

where  {u(i),  ieT}  is  a  sequence  of  independent  and  identically 
distributed  (IID)  random  variables.   The  set  of  real  constants 

00 

{h(i) ,  ieT+}   is  such  that  E|h(i)|  <<»,  and  the  function 

o 

00 

H(z)  =   Z  h(i)  z~L   where  z  is  a  complex  variable,  is  analytic 

o 
and  has  no  poles  outside  the  unit  circle  in  the  z  plane.  The 

correlation  function  of  this  process  is  given  by 

5£n)  =  g  h(j)h(j+n)  (2-14) 

and  the  corresponding  spectral  density  f(w)  is 

f(w)  =  -  |?  h(j)e^w|2  =  i  |  H(eiw)  |2      (2-15) 

2TT    °  2* 

It  is  assumed  further  that  the  process  (2-13)  has  a  rational 
spectral  density, that  is 


f(w)  =  i  I  H(elw) I2  =  l 


B(e   ) 


A(elw) 


2  ,  (2-16) 


where  both  A(e   )  and  B(elw)  are  polynomials  in  elw  of 
finite  order  with  all  their  poles  inside  the  unit  circle. 
The  process  (2-13)  may  be  generated  by  passing  the  sequence 
{u(i),  ieT}  through  the  digital  filter  H(z).   By  the  assumption 
in  the  definition  of  the  linear  process  and  by  the  restrictions 
on  the  spectrum,  there  exists  an  inversion  D(z)  where 

D(z)  =    -MeL     =     _1 — 


B(Z)  H(z) 


In  general,  D(z)  and  H(z)  may  be  infinite  polynomial  in  z. 
D(z)  may  be  written  as  the  one-sided  sequence 

D(z)  =  g  d^"1  (2-17) 

Passing  the  x(i)  sequences  through  the  digital  filter  will 
recover  the  generating  sequence  u(i),  that  is 

J 
u(j)  =_E  dtj-j)x(ij=   §  dli)x(j-i)  (2-18) 

This  is  called  an  operation  inversion.   In  general,  we  will  have, 

assuming  a0  =  1 

n    _i 

Eb .  z  oo 

H(z)  =  -2-=- r-       =   E  h(i)z_1        (2-19) 

1  +  y   a  .  z         o 

and  the  process{x (i) ,  ieT}  may  be  represented  in  two  ways 

i)   x(j)  =  E  h(i)u(j-i) 

(2-20) 
m  .  n 

n)   x(j)  =  i   biu(j-i)-  laix(j-i) 
o  1 

The  second  representation  indicates  that  if  H(z)  is  an  all- 
pole  function,  then  we  have 

m 
iii)  E  a^x(j-i)  =  u(j).  (2-21) 

o 

In  this  case,  since  inversion  uses  only  a  finite  number  of  past 
samples,  the  process  is  called  "finitely  invertable."   It  is 
clear  that  finitely  invertable  linear  processes  form  a  subclass 
of  autoregressive  schemes  for  which  case  the  set  {u(i)}  in 
(2-21)  would  be  orthonomal  rather  than  independent.   The  con- 
cepts and  definitions  above  can  be  readily  extended  to  a 


two-dimensional  linear  process.   A  two-dimensional  linear 
process  {x(m,n),  TTiezj,  nez2)  could  have  the  structure 


x(m,n)  =   ZZh(k,g )u(k-m,£-n) 
oo 
m  n 
=  ZZ  h(k-m,ft-n)u(k,H)  , 


(2-22) 


where  (u(k,§L),  kez-]_,  £.£Z2}  is  a  two-dimensional  sequence  of 
IID  random  variables  with  zero  mean  and  unit  variance.   The  set 
of  real  constant  {h(k.SL),  kez,  +  ,  isz2+}      is  such  that 

0O0O  ,        n 

0000  1         n        1  (1       —  lc     —  V 

ZZ I  h(k,j))  |<°°  ,  and  the  function  H(Z,,Z0)  =  EZh(k,X)Z,   Z0  S 

00  J.   z     00         ±    z 

where  Z,  and  Z2  are  complex  variables,  is  analytic.   The  correla- 
tion function  of  this  process  is  given  by 

Rw<m'n)  =  ??h(k,Jl)h(k+m,!+n)  (2-23) 

•*•■*       00 

and  the  corresponding  spectral  density  f(w,,W2)  is 
f(w1#w2)  =  (^)2   I?  h(k,l)eiwlk  eiw2^ 


I  ^2 

2TT. 


H(eiwl,  eiw2) 


(2-24) 


With  the  same  reasoning  as  in  the  one-dimensional  case,  we  have 
i  n  general , 


H(ZX,  Z2) 


Nl   N2 

Z    Z  b 

0    0    ij   1   2 


Z-1Z  ^ 


M,  M9 
1+Z1  Z 


aij  Z,  1  Z2_:i 


1=0  j=o 
(i, j)^(o,o) 

00  00         n  —V   — u 

5  g  h(k.J)z1  kz2  l. 


(2-25) 


and  the  process  (x(k,l),  keZ,,  leZ^Jmay  be  represented  in  two 
ways 

i)  x(k,H)  =  ??h(i.j)u(k-i,l-j) 
oo 
N1N2  Mi  M2 

ii)  x(k,£)  =  ZIb.  .u(k-i,j2.-j)  -      E  I   a,  ,x  (k-i,  jL-j)   (2-26) 

00  1J  00   J-j 

(i  rj)^(0,0) 

It  should  be  noted  that  the  moving  average  scheme  would  be  ac- 
complished by  passing  the  orthonomal  IID  random  variables  through 
the  nonrecursive  filter  and  the  autoregressive  schemes  through 
the  recursive  filter  for  both  one-dimensional  and  two-dimensional 
random  process. 

In  the  study  of  systems  subject  to  the  random  signals,  the 
concept  of  power  spectral  density  function  is  of  importance. 

For  a  given  transfer  function  of  a  linear  filter,  the  cross  power 
spectrum  between  input  and  output  of  the  filter,  and  the  output 
power  spectrum  is  of  primary  concern. 

Consider  a  (continuous)  system  subjected  to  a  random  input 
signal.   Given  a  linear  system  with  a  transfer  function 
H(jw),  the  input  to  the  filter  a  stationary  process  x.(t) 
with  an  autocorrelation  function  R   (z)  and  a   power  spectral 
density  function  G   (w)  ,  then  the  following  relationships 
are  obtained 

Gxy(w)  -  Gxx(w)H*(jw) 

G„„(w)  =  G   (w)H(jw)  (2-27) 

yy       *\Y 

Combining  above  two  equations, 

Gyy(w)  =  Gxx(w)|H(jw)|2  ,  (2-28) 
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where  G   (w)  is  the  cross  power  spec trum  and  G   (w)  is  the 
output  power  spectrum,  the  cross  correlation  would  be  cal- 
culated by  using  the  inverse  Fourier  transform  of  G   (w) . 

For  the  continuous  two-dimensional  linear  system  of 
H(Jwi/Jw2)/  subjected  to  a  stationary  two-dimensional  input 
signal  of  power  spectrum  G xx (w, ,  w2) . 

Bxy(wl'w2*  =  Gxx(wl'  W2)H^W1'^W2^ 

Gyy(wl'  w2)  =   Gxy(wl'  w2}   H  (  jwl'  jw2)      (2-29) 
Again  combining  above  two  relationships,  the  output  power 
spectral  density  function  is 

G   (w1#  w2)  =  Gxx(w1,  w2)|H(jw1#  jw2) |         (2-30) 

Consider  a  discrete  linear  system  to  which  a  random  input  se- 
quence is  applied  with  a  transfer  function  H(z).   The  input 
sequence  is  a  stationary  process  x(i)  with  an  autocorrelation 

function  Rvv(m)  and  its?-transform  G   (z),  where  G   (z)  is  equi- 
xx  J  xx    '        xx        ^ 

valent  to  the"  power  spectral  density  in  the  continuous  case,i.e 

Gxx<2>      4      3»Wm>]  (2-3D 

Then, 


G   (z)  =  G   (z)  H(z) 
xy       xx K  '   v 

Gw(z)  =  G   (z)  H(z)H(z_1)  (2-32) 

For  the  two-dimensional  discrete  case, 

Gxy(zl'    z2}    =   Gxx(zl'    Z2)H^1'    z2} 
and   G  y^1^2)    =   Gxx(zl'    z2}  H  (zl'  z2)  H  ^l^    Z~2~]  (2_33) 
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As  a  summary  of  this  section,  a  discrete-time  linear  process  can  be 
considered  to  be  the  result  of     digital  filtering  an  indepen- 
dent identical  random  sequence  having  zeromean  and  unit  variance, 
The  moving  average  scheme  is  the  result  of  filtering  through  the 
nonrecursive  filter  and  the  autoregressive  scheme  is  that  of 
filtering  through  the  recursive  filter. 

And  for  a  linear  system,  the  relations  between  transfer 
function,  power  spectrum,  and  auto  correlation  are  given  by 

G    =   G    |h|  2 

yy      xx  '  ' 

G    =   G    H 
xy      xx 

R  (autocorrelation  function)  -  7[  transform  of  power 

spectral  density  function. 

The  Figure  2-1  shows  the  block  diagram  which  describes 
the  various  relations  and  concepts. 
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ORTHONORMAL 
I . I . D . RANDOM 
SEQUENCE 


WHITE  NOISE  \ 
OF  ZERO  MEAN\ 

AND  UNIT 

'VARIANCE 


; 


LINEAR  SYSTEM 
H 


LINEAR 
CORRELATED 
RANDOM 
PROCESS. 


G       (Z) 
xx 


x(n) 

RANDOM    INPUT 


G    (Z)=G       (Z)H(Z)H(Z    1) 
VV  XX 


y(n) 


GxxV    Z2) 


x(m,n) 


H(Z1,    Z2) 

h (m,n) 


JGyy(Zlf    Z2)=GXX(Z1,Z2)H(Z1,Z2) 


H(Zjlz2l) 


WHITE    NOISE 


G       (Z,,    Z0)=l 
xx      1 '       2 


G       (Z,  , 

yy    i 


z2) 


H(Z1/    Z2)H(Z1"1/Z2~1) 


FIGURE    2  -1 

STATISTICAL 

PROPERTIES  OF  LINEAR  SYSTEM 
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C.   MODELING  OF  LINEAR  STOCHASTIC  PROCESS 


1.   Introduction 


A  more  active  concern  at  this  time  is  that  of  system 
modeling.   It  has  been  shown  in  a  previous  section  that  a  linear 
stochastic  process  (or  field)  can  be  generated  by  filtering 
white  noise  through  a  linear  filter.   The  problem  can  be  stated 
as  follows.   What  is  the  filter  equation  (difference  equation) 
that  produces  a  typical  random  process  with  a  specified  autocorrelation 
function?   That  is,  with  the  knowledge  of  second -order  statistics, 
determine  the  filter  coefficient  a's  and  b's  in  equations  (2-20) 
and  (2-26) .   It  is  clear  that  if  one  is  successful  in  developing 
a  parametric  model  for  the  behavior  of  some  random  process, 
then  the  model  can  be  used  for  different  applications,  such  as 
prediction,  estimation,  smoothing,  etc..  As  far  as  the  general 
modeling  problem  goes,  one  of  the  most  powerful  models  currently 
in  use  is  that  where  a  signal  x(n)  is  considered  to  be  the  out- 
put of  some  system  (filter)  with  unknown  input  u(n)  such  that 

the  following  relationship  holds 

P  q 

x(n)  =  -  I   clkx(n-k)  +  gZQ  b£u(n-£),  b0=l        (2-34) 

JC*-  J. 

where  a^,  1<_   k<_  p  ,  b£  1  <_5,<_q 

andthe  gain  G  are  the  parameters  of  the  hypothesized  system. 
Equation  (2-23)  says  that  the  "output"  x(n)  is  a  linear  function 
of  past  outputs  and  present  and  past  inputs.   That  is,  the 
signal  x(n)  is  predictable  from  linear  combinations  of  past 
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outputs  and  inputs .  For  1he  two-dimensional  case,  the  difference 

equation  corresponding  to  equation  (2-34)  would  be 

Mx   M2  Lj    L2 

x(k,£)  =  -  Z    Z   ai-x(k-i/£-j)+G   Z     Z    b  ,   2u  (k-^,  £-£2) 
i=o  j=o    -1  £,=u  £2=u     ' 

where  (i,j)^(0, 0)  (2-35) 

Equations (2-34)ard(2-35)  can  also  be  specified  in  the  frequency 

domain  by  taking  the  z  transform  of  both  sides  of  Eq  (2-34) and  Eq 

(2-35)  . 

q     -£ 

H(z)  =  jL<f>  =  G  l+   *i,1'*' (2"36) 

U(Z)  p      , 

1+   Z  a.  Z  K 
k=l  k 

oo 

where  X(Z)  =   Z   x(n)Z~n    is  the  Z  transform  of  x(n),  and  u(Z) 

n=  — °° 
is  the  z  transform  of  u(n) . 

For  the  two-dimensional  case, 


L,   L9  -£..  -£0 

s     i      \  *2zi     z2  (2-37) 


X(Z,,Z~)    £,=0£9=0 


U<  W         Z1  Z2    a   Z"1  Z  "3 
x  A  1  +.L      L  aijzl   z2 

i=o  j=o  J 

(i. j)^(o.o) 
where  X(Z1,Z2)  =  3[x(k,£)] 
U(Z1,Z2)  =  3[u(k,£)] 
H(Z)an<}I(Zlf  Z2)  in  Eqs  (2-36)arri  (2-37)  are  the  general  pole  zero 
models. 
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The  roots  of  the  numerator  and  denominator  polynomials  are  the 
zeros  and  poles  of  the  model,  respectively.   There  are  two 
special  cases  of  the  model  that  are  of  interest, 
i)  all- zero  model  ak  =  0,  a-ij  =  0 
ii)  all- pole  model   b  =  0,  bJU£2  =  0 

1  £  l  1  3 

0  <_  £1<_  L1 

0  <_  l2<    L2 

But  (£1,  £2)^  (0,0) 
As  mentioned  in  section  II -B,   the  all- zero  model  is  known 
in  the  statistical  literature  as  the  moving  average  (MA)  model 
and  the  all -pole  model  is  then  known  as  the  autoregressive  (AR) 
model.   The  pole-zero  model  is  then  known  as  the  autoregressive 
moving  average  (ARMA)  model.   It  should  be  recalled  here  that  we 
are  interested  in  the  case  where  u(n)  or  u(kTi)  is  white  noise,  and 
this  will  be  treated  as  a  special  case  in  the  following.   The 
modeling  problem  can  be  stated  as"given  signal  x(n)  or  x(k,2j, 
find  the  filter  coefficients  (a's  and  b's)and  the  gain,  G, in 
Equation  (2-34)  in  some  manner."   Two  approaches  will  be  given  for 
a  solution  of  the  above  problem.   The  first  is  the  method  of  least 
squares which  is  based  on  the  optimal  estimation  concept,  and  the 
second  isthefilter  response  method  in  which  linear  system  pro- 
perties are  used.   The  one-dimensional  case  will  be  treated 
first,  then  two-dimensional  case  including  some  examples.  For 
example,  a  lowpass  correlated  random  process  (field)  and  band 
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limited  random  process  (field)  are  chosen  since  they  represent 
practical   examples. 

2.   The  Method  of  Least  Squares 

Athough the following  can  be  applied  to  the  deterministic 
signal  and  stochastic  processes, stationary  or  nonstationary , 
it  is  emphasized  for  only  a  stationary  random  process  and  only  the 
all-pole  model  is  considered  [6] .  In  the  all- pole  model,  we 
assume  that  the  signal  x(n)  is  given  by  as  a  linear  combination 
of  past  values  and  some  input  u(n) : 

P 
x(n)  =  -   Z    Avx(n-k)  +  Gu(n)  (2-38) 

k=l   K 

where  G  is  a  gain  factor. 
Here,  it  is  assumed  that  the  input  u(n)  is  totally  unknown, 
which  is  the  case  in  many  applications.   Therefore,  the 
signal  x(n)  can  be  predicted  only  approximately  from  a  linearly 
weighted  summation  of  past  samples. 

Let  this  approximation  of  x(n)  be  x(n),  where 

x(n)  =  -  I        Avx(n"k)  /  (2-39) 

k=l    K 

then  the  error  between  the  actual  value  x(n)  and  the  predicted 

value  x(n)  is  given  by 

A  P 

e(n)  =  x(n)  -  x(n)  =  x(n)  +   E  Avx(n-k).   (2-40) 

k=l   K 
e(n)  is  also  known  as  the  residual.   In  the  method  of  least 

squares,  the  parameter  ^'s  are  obtained  as  a  result  of  the 

minimization  of  the  mean  square  error  with  respect  to  all  of 

the  parameters . 
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If  the  signal  x(n)  is  assumed  to  be  a  sample  of  a  random 
process,  then  the  error  in  equation  (2-40)  is  also  a  sample 
of  a  random  process.   In  the  least  square  method,  we  minimize 
the  expected  value  of  the  square  of  the  error. 

2  P 

E[e^(n)]  =E{[x(n)  +   E    Avx(n-k)]z}  (2-41) 

k=l    K 

2 

E[e  (n) ]  is  minimized  by  setting 

8E[e2(n)l   .  ^  1  <    .  <  (2_42) 


3A£ 

From  (2-41)  and  (2-42)  we  obtain  the  set  of  equations 

P 

I   Ak  BKx(n-k)x(n-i)]  =  -E [x (n)x (n-i) ]  1-i-P   (2-43) 
k=l 

Then  the  minimum  average  error  is  given  by 

E  =  E[x2(n)]  +  F  AkE(x(n)x(n-k)  )  (2-44) 

k=l 

For  a  stationary  process  x(n),  we  have 

E[x(n-k)x(n-i)]  =  Rxx(i~k) 
where  P^x(i)  is  the  autocorrelation  of  the  process. 
Note  that  equations (2-42)  and  (2-44)  lead  to  the  well  known 
orthogonality  principle  [7] .   Since  in  the  least  squares 
method,  we  assumed  that  the  input  is  unknown,  it  doesn't  make 
much  sense  to  determine  a  value  for  the  gain  G.   However, 
there  are  certain  interesting  points  that  can  be  made. 
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Equation  (2-39)  can  be  written  as 

x(n)  =  -  ?  A,x(n-k)  +  e(n)  (2-45) 

k=l   K 


Comparing  (2-45)  and  (2-38) ,  it  is  seen  that  the  only  input 
signal  u(n)  that  will  result  in  the  signal  x(n)  as  output  is 
that  where   Gu(n)  =  e(n).  (2-46) 

That  is,  the  input  signal  is  proportional  to  the  error  signal. 
e(n)  can  be  also  considered  as  the  modeling  error.    The  error 
variance  can  be  calculated  by  Equation  (2-41)  and  the  filter 
coefficient  A,  (k=l  .  .  .  p)  would  be  calculated  by  equation 
(2-43)  if  the  correlation  function  of  process  x(n)  is  available 

At  this  moment,  it  should  be  recalled  that  a  linear  random 
process  is  generated  by  linear  filtering  of  white  noise.  Therefore, 
we  are  interested  in  white  noise  inputs  for  the  purpose  of 
modeling       a  given  linear  random  process.   That  is,  the 
input  u(n)  is  assumed  to  be  a  sequence  of  uncorrelated  samples 
(white  noise)  with  zero  mean  and  unit  variance. 

E[u(n)]  =  0,   E[u(n)u(m)]  =  <5mm 

Then  the  output  x(n)  forms  a  stationary  random  process 

x(n)  =  -  ?  Avx(n-k)  +  Gu(n)  (2-47) 

k=l   K 

Multiplying  equation  (2-34)  by  x(n-i)  and  taking  the  expectation, 

E[x(n)x(n-j) ]  =-  I      AVE [x (n-k) x (n-i) ] 

k=l   K 

+E[Gu(n)x(n-i) ]  (2-48) 

Noting  that  u(n)  and  x(n-i)  are  uncorrelated  for  i>0  and  re- 
calling that  for  stationary  process,  E [x (n) x (n-i) ] =R   (i) , 
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Equation  (2-35)  turns  out  to  be 

P 
Rxx(i)  =  -  2   AkRxx(i-k)for  p  >  i>  o  (2-49) 

k=l 

and   R(o)    can  be  obtained   by    plugging  x(n)    of   Equation 

(2-3 8)  into  Equation    (2-48) 

p  2 

Rxx(o)   =  ~      l      A&ik)+G  (2-50) 

k=l 

Therefore ,    the   gain  can   be   given  by 

°2   =R-<0)+   V    Ak*xx<k>  (2-51) 

k=l 

It  is  noted  that  through  Equation  (2-46),  that  is  Gu(n)=e(n)/ 

the  white  noise  input  of  zero  mean  and  unit  variance  generates 

the  random  process  e(n) ,  which  is  again  white  with  zero  mean 

2 
and  variance  of  G  .   Therefore,  from  Equation  (2-4  9) ,    the  re- 
cursive filter  coefficients  A, , (k=l,  ....p)  can  be  calculated 
and  using  these  values  the  gain  G  would  be  calculated  by  Equation 
(2-51)  with  the  knowledge  of  autocorrelation  function  of  a  given 
class  of  linear  random  process.   So  far,  modeling   of  one- 
dimensional  stochastic  process  has  been  considered.   Similar 
reasoning  can  readily  be  extended  to  the  modeling   of  two- 
dimensional  random  fields.  Again,  the  two-dimensional  all- pole 
model  is  considered. 

Mi   M2 
x(k,&)=-£    z        A..x(k-i,£-j)+Gu(k/£)       (2-52) 
i=o  j  =o   J 
(i,j)^(o,o) 
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Let's  define  the  set  ft(k,£)  such  that  for  all  i,j 

(k-i,  PL- j )  en(k,£), 

all  the  values  of  x(k,£)  in  Q,(k,i)    will  be  used  to  estimate 
the  point  x (k, £) . 


K-Ml 


^2 


a   »    • 

V    0    0 

0  • 


•  4  * 


• 


# 

•  »  • 

i 

•  x(k,£-M  ) 
4  •  ♦ 


•  # 


•J» 


fi(k,£) 


x(k,£)#  # 
•  •  •  • 


FIGURE  2-2 
DEFINITION  OF  fi(k,£) 
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Again  the  coefficients  A-  •  will  be  determined  such  that  the 

mean  squared  error  is  minimized.  The  estimate  of  x(k,&) , 

A 

x(k«£),  is  given  by  a   linear  combination  of  the  previous  values 

A 

x(k,M  =  -  ZnEx(k-ifJl-j)  (2-53) 


The  mean  squared  error  is 

E[e2(k,£)]=E{[x(k,£)-  x(k,£)]2}  (2-54) 

If   E[e  (kr£)]  is  minimized,  x(k,£)     is     the  "linear 
least  squares  estimate"  of  x(k,£). 

Going  through  the  same  procedure  as  in  the  one-dimensional 
case,  that  is,  substituting  (2-53)  in  (2-54)  and  differentiating 
with  respect  to  each  A. .,  setting  derivatives  equal  to  zero,  we 
obtain  the  following  set  of  simultaneous  equations  for  the 
unknown  A. . 

E([x(k;t)4(k,Jl)]x(i,j)}  =  o  for  allfij)efi,  (2-55) 
which  says  that  the  coefficients  A. .  must  be  such  that  the 
estimation  error  [x (k,£) -X (k,£) ]  is  statistically  orthogonal  to 
each  x(i,j)  that  is  used  to  form  the  linear  estimate.   This 
is  known  as  the  orthogonality  principle  in  linear  least  square 
estimation. 

The  modeling   error  is  the  difference  between  the  true 

value  x(k,£)  and  the  estimate  x(k,£).  By  definition, 

A 
e(k,£)=x(k,£)-x(k,£)=Gu(k,£)  (2-56) 

from  the  equation  (2-52)  and  (2-53) . 

Again,  we  are  interested  jntbevhite  noise  field  of  zero 

mean  and  unit  variance.     Then  the  modeling   error  is  also  a 


30 


random  field. 

Rewriting  the  Equation  (2-52)  in  terms  of  the  error  e(k,£)  gives 
x(k,£)=-rfiZ  Aj_.x(k-i/£-j)+e(k,£)  (2-57) 

To  calculate  the  error  variance,  multiply  (2-57)  by  x(k-m,£-n) 

and  take   the  expectations 

E[x(k,£)x(k-m,£-n) ]=-  ZfiZAi -E [x (k-i7 £-j ) x (k-mf £-n) ] 

+E[e(k,£)x(k-m,£-n) ]    (2-58) 

For  the  stationary  process, 

E[x(k-i,£-j)x(k-m,£-n)  ] -F^lm-in-j  ]  ,  then  (2-58)  will  be 

RxxCm1n)=-E^ZAi.Rxxfrn-ifn-j)  for  allfm,  n)efi.   (2-59) 

The  second  term  on  the  rLghtside  of  (2-58)  is  zero,  because  of  the 

orthogonality  principle  and  R(0,0)  can  be  obtained  by  the 

following  equation: 

Fj)7Q)=-ZQL    AijRxx(i,j)+G2  (2-60) 

Therefore  the modeling  error  variance, 

E [e2 (k,£) ]=E{ [ (Gu(k,£) ] 2}  =  G2  is  given  by  the  equatiom 
G2=E[e2(k,£)  ]=R^(D,0)+E  I    Ai-j^x  ^'^  and  the  mean  of  error 

e(k,£)  is   E[e(k,£) ]=E[Gu(k,£) ]=0 

Again  with  the  knowledge  of  autocorrelation  function  of  the 

two-dimensional  stationary  random  field,  the  filter  coefficient 

A. .  can  be  obtained  from  the  Equation  (2-59)  and  using  these 

values  of  A. • ,  the  gain  G  in  Equation  (2-52)  can  be  calculated 

by  Equation  (2-60) . 

Example  1   Consider  a  one-dimensional  stationary  band  limited 

random  process  forwhichtheautocorrelation  function  is  given  by 
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R^x(m)  =  />,m|  cos(w0m) 
Find  a  model  which  describes  this  process, 
all-pole  model  is  chosen  such  that 


An 


x(n)  =  -  i      A,x(n-k)+Gu(n) 
k=l   K 


where  E  [e(n) ]  =  0 

E  [u(n)u(m)  ]  =  <Smn 
One  has  to  choose  the  order  of  the  difference  equation;  here,asecond 
order  model  (p=2)  is  chosen.   Then 

x(n)=-A1x(n-l) -A2x (n-2) +Gu (n) 

The  problem  shrinks  down  to  that  of  finding  A,,  A2/  G  with  the 

given  R   (m) . 
3      xx 

From  Equation  (2-4  9) 

k=l 
R^l)  =  -  A-^o)  -  A2^-l) 

W]    "    "   Al^1}     "   A2%0) 
Putting  this  in  matrix   form 

-%D     -  m  0) 


From   Rxx(m)    =    P 


iml 


cos    (w  m) 

o         ' 


-1       -  p  COS    w0 
-  p COS   w      -1 


)     \  A2  J 


0  cos   w< 


P^   cos    2   w, 


Therefore     At  n  ,-i     cO  <■>         \ 

1   =__   £  cox  w„  (1-/°^   cox    2   wq) 


n2  2 

1     -    P        COS        Wo 
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A2  = 


P2sin2w0 


1  -  p2cos2  w0 


From  Equation  (2- 51) 


G   =  R  (0)  + 


xx 


?   A, 


k=l 


XX 


(k) 


=  1  +  A,R  (1)+  A9P   (2) 

1  XX        ^  XX 

=  1  +  A,p  cos  w0+  A2P  cos  2  Wo 

,  _  p2cos2wo (l-ft2cos  2  w°)       p4sin2w0cos  2  w0 
1  -  />2cos2w0  1  -  p2cos2  w0 

Example  2   Given a  stationary  two-dimensional  random  field  with 
Rxx(m,n)  =  p|m|p|n|  , 

find  the  autoregressive  model  of  this  random  field  (most    mono- 
chromatic images  can  be  assumed    to      have  this  form  of  auto- 
correlation function). A  first-order  model  (M  =  1,  M2  =  1)  is 
chosen.   Then  equation  (2-52)  can  be  written  as  follows: 
x(k,A)  =  -7  A10x(k-1,£)-A11x(k-1/  £-l)-A01x(k,£"1)  +Gu(k,£) 
Where  u(k,£)  is  white  noise  with  zero  mean  and  unit  variance.- 

Then,  using  Equation  (2-59) 
1     1 
iym,n)=-  i^Q    £Q     AijRy|m-i,n-j)  for  all(m/  n  )£  fi  . 

(if j)^(o.O) 
Rxx(l„0)  =  -  A1QRxx(0,0)  -  A11Rxx(0  -1)  -A^Rxx  (1  ,-1) 

Rxx(l^l)    =    -   A..  _R       (0,1)     -    AinR       (0.0)     -An1Rxx(l#    0) 
10   xx  11    xx  01 

Rxx(0vl)    =    -   A,.R       (-1,1)-   A...R       (-l,0)-An,R       (0,     0). 
10   xx  11    xx  01    xx 


Putting  thisin  matrix  form  gives 
-Rxx.(0,0)   -Rxx(0,-1)    -Rxx(l,-1) 
-Rxx(0,-1)  -Rxx(0,  0)    -Rxx(lf0) 
-Rxx(-1T1)  -Rxx(-lt0)    -Rxx(0f0) 


N 


10 


11 


L0lJ 


Rxx(lr0); 
Rxx(lTl) 
Rxx(0»l)^ 
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For  thegiven  auto   correlation  function  above 


-1      -f>      -fz 

-P     -1      -7° 
fi2   _;/o     _! 


/ 


10 


11 


l01 


which  yields 


L01 


Lll 


L10 


=    ~P 
=      P: 


The  modeling   error   variance  or  the  square  of  the  gainG     can   be   cal- 
culated  by   Equation    (2-60) 

2  1      X 

G      =Rxx(°'°>+    i=ZQ    ^0   AijRxx(i'^ 

(irj)^(0,0 
=   1   +   A01Rxx(0'1)+  AllRxx(1^+  ^O^x  (1'0) 

=  (1-p2)2 

D  I  ml  pt n  I 
Therefore,  the  complete  model  for  Rxx  =  r    f     is 

x(k,£)  =  px(k,£-l)  +  /3  x(k,  £)  +  px(k-l,£)  +  (l-:f  )u(k,£) 

where  E  (u(k.£)  )  =  0 

Eiu(k.A)  u(k-p,£-q)]  =  O.pq. 

3 .   Method  of  Filter  Response 

Another  method  of  modeling  a  linear  stationary  random 

process  is  based  on  the  concept  thatalinear  random  process  is  a 

result  of  filtering  white  noise  through  a  linear  filter.   In 

Section  B.  of  this  chapter,  the  properties  of  linear  systems 

have  been  discussed. 
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For  the  discrete  system,  it  is  known  that  the  filter  output 
power  spectral  density  is  the  Z  transformation  of  the  output  correla- 
tion function.   That  is, 

Gyy(Z)  =  3  [R.yy(m)] 
Gyy(Zi,  Z2)  =  3  [Ryy(m,n)] 
and  also  it  is  noted  that 

Gyy    (Z)    =    Gxx(Z)    H    (Z)    H    (Z~    ) 

Gyy(Z    ,    Z    )    =   Gxx(Z    ,Z)H(Z,Z)H(Z         ,    Z         ), 

12  12  12  1  2  ' 

where  Gxx(Z),  Gxx(Zi,  Z2)is  the  input  power  spectral  density 
function  and  H(Z),  H  (Zi,  Z2)  is  the  transfer  function  of  the 
filter.   Denoting  the  white  noise  input  as  u(n)oru(k,£)  and 
the  output  of  filter,  which  is  a  linear  stationary  random  pro- 
cess we  are  going  to  model  as  x (n)or  X(k,£)  r   the     problem 
can  be  stated  as  follows:   For  a  given  autocorrelation  function 
ofalinear  random  process  (field)  R(m)  ,  (R  (m,n)),  find  a  linear 
system  such  that  when  the  input  is  white  noise,  the  output  of  the 
filter  gives  a  given  autocorrelation  function  R(m) , (R(m,n)). 


u(n) 


u(k,*) 


H(Z) 


H(ZX,  z2) 


x(n) 

— +- 


FIGURE  2-3 
THE  MODELING  PROBLEM  IN  THE  DISCRETE  CASE 


x(k,4) 
Given  ^(m),  Rxx(m'n) 
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If  the  input  is  white  noise,  then 
GUU(Z)  =  Const, 

Guu(Zl'  Z2}  =  Const. 
Therefore,  the  solution  for  the  required  filter  is  to  find  a 

function  H(Z)  or  H(Z-,,  Zo)  which  satisfies 

G   (Z)  =  Const-H(Z)H(Z_1) 

X^V 

GXX(Z1#  Z2)  =  Const-H^,  Z2)H  (Z£!,  Z2_1)   (2-62) 
where  G„„(Z),  G   (Z,  ,  Z0)  is  known  through  the  relation 

XX         XX    X     ^ 

Gxx<Z)    =   3   IRxx<m>l 

Gxx<Zl'    Z2>    =3[Rxx<m'n'l 

since  R   (m) ,  R   (m,n)  is  given. 

xx      xx         3 

Butforthe  two-dimensional  case,  there  is  an  inherent  difficulty 

in  factorization  of  G   (Zj,  Z  2)  to  H(Z-,,  Z2)H(z7   ,  Z2~  ). 

Therefore,  only  separable  functions  can  be  modeled  by  this 

technique. 

Example  1   For  a  given  stationary  linear  random  process  with 

R   (n)  =  S2p,mI  cos(w0  m) ,  m=o,  1,  2  .  .  . 

Find  a  difference  equation  which  will  give  a  random  process 

with  autocorrelation  above. 

G   (Z)  =  MR   (m)  ] 
xx      ^  xx 

_  g*2(l-|°2)  .  [-Z  /°cos  w0+(l4-^2)-Z~1  cox  wQ]       (2-51) 

(l-2f>Zcos  w0+p2Z2)  (l-2/PZ~1cos  w0+f2z~2) 

The  second  step  is  to  find  a  factored  expression  for  G   (Z) ,i.e. 
GxxZ)  =  G1(Z)  *  Gl^2"1) 
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Assuming  that  G   (Z)  has   the  form 


GXX(Z)  =  32(l-p2) 


az    +   b 


az  +  b 


(2-52) 


l-2/°Z  cos  w0+/^2z~2  l-2pz  cos  w0+p2z 


Comparing  (2-52)  and  (2-51) ,  a  and  b  can  be  obtained  as 
a  =  j   (   1-p  cos  w0+p2    +     l+y°  cos  w0+p 


1  2 

b  =  j    1-p  cos  Wo+p 


1+  P   COS  W0+jO 

From  equation  (2-50),  if  we  choose  H(Z)= 


az1   +   b 


l-2pz'1cos  w0+p2z~2 


2  2 

then   the   term  &    (l-j°    )    in  Equation    (2-52)    can   be   considered 

as   G       (Z) 
uu 


Therefore 


(1-p2)    d2      if  m=0 


Ruu(m) 


0  if  m^O 

and  the  complete  model  is  drawn  block  diagram  form  in  Figure  2-4. 


u(n) 

Hm_   az  +b 

x(n)  has  given 

autocorrelation 

White  Noise 
with 
variance 
=  (1-/32)  2 

l-2pz   cos  wo+p2z-2 

x(z) 

FIGURE  2-4 
FILTER  FOR  ONE-DIMENSIONAL  BAND  PASS  PROCESS 
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lb  put  the  system's  input-output  relation  into  state  equation 


form,  it  is  defined  for  convenience  as 

u(z)  =  z~  w  (z) 
where  w(n)  is  also  white  noise  with  same  autocorrelation 
function 

R(m)  =   (1-P2)tf2    if  m=0 

WW  ' 

0       if  m^O 

then  the  transfer  function  is 

X(Z) 


H(Z)  = 

D  ef ining 
Xx(z) 


az  1  +  b 


U(Z) 


-p2z  1X2  (z) 


-1         2  -2 
l-2pZ   cos  w0+p  z 


X2(z) 


U(z) 


-1   2  -2 
1-2'P  cos  woz   +p  z 


X(Z) 


-1 


=   (az   +b)X2(z) 


then 


and 


Xx(n)  =  -p  x2(n-l) 

X2(n)  =  x1(n-l)  +  2p  cos  w0x2(n-l)  +  w(n-l) 

X(n)   =  ax2 (n-l)+bx2 (n)  =  -P   ax(n)+bx2(n) 


Putting  the   state     and  output  equations  in  matrix  form  gives 


(       i    \\         ( „        ~2   V   .   .  .>      f  -  \ 
x-^n) 


x9  (n) 


0  -  p 

1  2  cosw0 
-2 


x 


1(n-l)> 


i  x2 (n-1 


w(n-l) 


x(n)  =  (-p  a   b) 


x-^n) 
x0  (n) 


Example  2   The  two-dimensional  "band  limited"  discrete 

Markov  process  is  defined  by  the  autocorrelation  function 

t,   /    \    ,2r,  (m)      ,    \ir^(m)    /    \i  m=0.1.2... 
R^On,!!)  -<*•  [frv   cos  (wfm)]  [p2   cos(w2n)]  nmQ'fl't2mmm    then 

the  discrete  power  spectral  density 

Gxx(zl'  Z2>  =6J2(1-Pi2)d-P22)   A(Z1/  22} 


B(ZX,  Z2) 


where 


2-1  2-1 

A(ZX,  Z2)  =  [-Z1Cosw1+(l+y01  )-Z1   CosW-jJ  [-Z2CosW2+(l+j?2  )-Z2  Cos  w2] 

B(?l'  z2)  =  [(1"2r3izicos  wi+/]izi)  d-2p1zj[1cos  w1+p2z^2)] 
[(l-2^2z2cos  w2+^2z2)  (l-^z^cos  w2+(G2z22)] 

Putting  A(Z,,  Z2)  in  the  following  form 

A(ZX,  Z2)  =  [(alfz^1+b1)  (a2z2+b1)  ]  [(a2z21  +b2)  (a2z2+b2)  ] 

and  comparing  this  equation  and  above  A(Z,Z~)t  a,,  a2,  b,  ,  b2 
obtains 


al  j(       Jl-p^os  w1   +  p1   +   V  1  +  p-|_cos  v^+p^) 


a2=   7(       7  3  "P2cos   w2    +  Pi        +       V  :     +  f°2cos   w2+l°2    } 


|(        Jl-p^cos  wx  +  px2      -       yi   +  (o1cos  w1+px2) 


b2=    *(       n/1^003   w2   +  ,p22      -      v/  X   +  f32cos  W2+P22) 
Let  H(Z1,    Z2)    =    (a1z~1+bJ)  (a2z~1+b2) 


(l-2^1z~1cos   w1+/°2z12)  d-2/02z21cos   W2+P2Z22 


39 


then   from    (2-50) 


R      (m,n)    = 

uu 


fO)2(l-P12)  d-P22)       if   n=o,    m=0 
o  if   n^O,    m^O 


u(z1#  z2) 


White  Noise 
of  variance 
=^2(l-p12)(l-f> 


H(Z1;Z2) 


— >- 


FL<(m,n) 


^'"cos^mXpJwSTiJ.n) 


FIGURE  2-5 
FILTER  TO  GENERATE  BAND  PASS  RANDOM  FIELD 


It  is  convenient  to  define . 

U(ZX  ,  Z2)  =  Z1-1Z2"1W(z1/  z2) 

W(k,£)  is  also  white  noise  with  the  same  statistics  as  w(k,£) 

Rww(n'm)=  \  ^(l-f^2)  d-f22)   if  n=0  and  m=0 
0  if  n^O  or  m^O 

Then,  the  filter  has  the  form 

X(Zlf  Z2)  =  (a1z~1+b)  (  a2z2'1+   k>2)  Z1~1Z2~1 


w(zlf  z2) 


(l-2piz~1cos 


-2 


-1 


Wl+PlZl  ^  ^1~2P2Z2  COS  W->"1"P->Z'>  ^ 


2-2; 

2T2Z2 


The  following  definitions  are  made: 
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N1(Zlf    Z2) 


N2(Zir     Z2) 


-piWv  z2' 

Zo-lw(Z    ,    Z    ) 
2 l 2 


cos   w,    + 


2 — ^2" 


1-2^.2:,  """"cos  w,    +    3,    ~, 
(a^1      +   b1)N2(Z1,    Z2) 


1*1 


From   the    last   three  definitions   one  can  write  the  set  of 


difference   equations 


\ 
N-^k,*.) 


N9(k,£) 


/ 


\ 


\ 


0      - 


Pi' 


N1(k/£-l) 


1   If.cos  w 1     N2(k,£-1) 


/      SN 


1 


w(k,£-l) 


x-^k,^    =   -^>,    a1N1(k,£)+b1N2(k,il) 

Now,    additional  definitions   are  made. 


M1(Z1'    V    =    -PIZ2~\{Z1'    Z2} 
M2(Z1,    Z2)    =        Zl      Xl    (Z1'    Z2} 


l-2^2z21   cos   w2   +/>22z2-2 


From  these  definitions    it   follows    that 


-1 


X(Zlf    Z2)    =    (a2z2         +   b2)    M2     (Z1#    Z2)  (2_55) 


Then 
rMx(k,£) 


,  M    (k,£) 


( 


0     -f/ 

1        2f>    COS    w 


2  J 


M^k'-l,*) 


M    (k-l,Jl) 


'    (P 


x1(k-l,£) 


(2-56) 


X(k,£)    =   -/o2^a2M1(k/    I)    +   b2M2(k,£) 

Combining  these  all  together  ,  the    following   form   is   obtained 
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^(k+1,4)^ 


M2(k+1,*) 
N1(k,£+1) 

N2(k/£+l)J 


1 
0 

lo 


"Pi 


2p2cos   w2   -p,a,      b. 


-Pi 

2plCos^ 


M^kffcf 


M2(k,£) 


N1(k,£) 


N2(k,£) 


f 


'"1 

0 
0 

UJ 


w(k,£) 


and 


(k,fc)    =    [-p2   a2 


b0 


0         ] 


rM1(k,4) 

M2(k,£) 

N^kfJl) 

N2(k,£)J 


TVo  methods  of  modeling   have   been  discussed.      Some   of   the  above 
examples  will   be   used    in   later   chapters.      It    should   be   noted 
again   that   the    filter   response  method     cannot     be  used  for   the 

case  where   the   autocorrelation   function    is   nonseparable. 
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III.   ADAPTIVE  FILTERS 

A.   NONRECURSIVE  FILTER 
1.   Introduction 

Many  forms  of  adaptive  filters  have  been  described  in 
the  literature,  some  of  which  have  been  shown  to  be  optimal 
(or  suboptimal)  in  certain  applications.   The  special  form 
of      an  adaptive  nonrecursive  filter  developed  by  Widrow  [1] 
is  reviewed  here  to  give  some  insights  to  the  recursive  adap- 
tive filter  developed  in  next  section. 

The  filter  to  be  considered  here  consists  of  a  tapped 
delay  line,  variable  weights  whose  input  signals  are  the  signals 
at  the  delay  line  taps,  a  summer  to  add  the  weighted  signals, 
and  machinery  to  adjust  the  weights  automatically.   The  impulse 
response  of  such  a  discrete  system  is  completely  controlled 
by  the  weights.   The  adaptation  process  automatically  seeks  an 
optimal  filter  impulse  response  by  adjusting  the  weights. 

Two  kinds  of  processes  take  place  in  the  adaptive  filter: 
training  and  operating.   The  training  (adaptation)  process  is 
concerned  with  adjusting  the  weights, and  the  operating  process 
consists  in  forming  output  signals  by  weighting  the  delay  line 
tap  signals,  using  the  weights  resulting  from  the  training 
process.   During  the  training  process,  an  additional  input 
signal,  "the  reference  (or  desired)  response,"  must  be  supplied 
to  the  adaptive  filter  along  with  the  usual  input  signals. 
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This  requirement  may  in  some  case  restrict  the  use  of 
this  particular  form  of  adaptive  filter.   An  example  illustrat- 
ing the  use  of  the  desired  signal  is  the  case  of  modelling  an 
unknown  system  by  a  discrete  adaptive  filter  as  shown  in  Figure 
3-1.   Here  a  discrete  input  signal  x(n)  is  applied  to  an  un- 
known system  to  be  modeled.   The  discrete  adaptive  model  is 
supplied  with  an  input  x(n).   The  output  of  the  unknown  system 
d(n)  is   compared  with  the  output  y(n)  of  the  adaptive  system. 
This  system  can  self -adapt  to  minimize  the  mean  square  error, 
(throughout   this   thesis,      the  mean  square  error  is  chosen 
as  the  performance  measure),  where  the  error  is  defined  as  the 
difference  between  the  output  of  the  adaptive  model  and  the 
desired  signal  (for  this  problem   the  desired  signal  is  the  output 
of  the  unknown  system  to  be  modeled) . 


►GHH 


adjustable 
weights 


y(n)  Filter 
fc  Output^ 


Input 


Unknown  System 
To  be  modeled 


system 
output 


FIGURE  3-1   MODELLING  OF  UNKNOWN  SYSTEM 
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then  N 

y(n)    =   i|0        w±x    (n-i)  (3-1) 

e(n)    =      y(n)    -  d(n)  (3-2) 

Noting   that   equation    (3-1)    is    the  convolution  summation,  the 
sequence   of  weights (wi)    can   be    seen   asthe    impulse   response 

of   the  adaptive  system. 

2 

The  weights  w-   are  adjusted  to  minimize  E(e  )  . 

It  will  be  shown  that  if  the  input  and  output  signals  of  the 
system  being  modeled   are  stationary,  the  error  signal  has  a 
mean  square  value  which  is  a  quadratic  function  of  the  weight 
settings . 

For  the  minimization  of  mean  square  error,  the  steepest 
descent  method  is  used.  Throughout  this  thesis  ,      the  terms 
"filter  coefficient  updating  process"  and  "adaptation  process"  are 
used  interchangeably  and  it  is  assumed  that  the  input  to  the 
adaptive  system  and  the  desired  signal are  stationary  random 
processes  (or random  fields  for  the  two-dimensional  case)  . 

2 .   Performance  Surface,  Gradient  and  the  Wiener  Solution. 
The  input  signals  are  weighted  and  summed  to  form  an 
output  signal  [Equation  (3-1) ] .   Introducing  the  vector  notation 

such  that    w(n)  =   ^w0 '  wl '  w2  *  *  *  WN^ 

and     X*  =  [x(n),  x(n-l),  x(n-2)  .  .  .  x(n-*l)]     (3-3) 
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Then  E quation  (3-1),  which  describes  the  linear  combination 
(operating  process),  can  be  written  in  matrix  form. 
y(n)  =  "^x  =  x%  (3-4) 


and  e(n)  =  y(n)  -  d(n) 

=  WTx   -  d(n) 
T.he    square  of   this   error    is 

e2(n)    =  TFx"x%~  -    2d(n)   t\TTx      +  d2  (n) 


(3-5) 


(3-6) 


the  mean  square  error,  the  expected  value  of  e  (n)  ,  is 
2, _%,    -2,_x    ^T«    .  ^T 


w 


(3-7) 


E[e  (n)]  =  d*(n)  -  2W^d  +  W-    Rxx 

where  the  vector  of  cross-correlation  between  the  input  signals 
and  the  desired  response  is  defined  as 


E[d(n)  X  (n)  ]  =  E  |  d(n)x(n) 

d(n)x(n-l) 


xd 


d(n)x(n-N)  (3-8) 

and  where  the  correlation  matrix  of  the  input  signal  is  defined 
as 


E[X(n)XT(n)  ]  =  E 


X(n)X(n)   X(n)X(n-l)  .  .  .  X(n)X(n-N) 
X(n-l)x(n)X(n-l)X(n-l)  .  .  .X(n-2)X  (n-N) 


X(n-N)X(n-N) 


A 
=  R. 


xx 


(3-9) 
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It  may  be  observed  from  (3-7)  that  for  stationary  input  signals, 
the  mean-square  error  is  precisely  a  second  order  function  of 
the  weights.   The  mean  square  error  performance  function  may  be 
visualized  as  a  bowl  shaped  surface,  a  quadratic  function  of  the 
weight  variables.   Then  the  adaptive  process  has  the  job  of  con- 
tinually seeking  the  "bottom  of  the  bowl."   A  means  of  accomplish- 
ing this  by  the  well-known  method  of  steepest  descent  is  dis- 
cussed below. 

In  the  non- stationary  case,  the  bottom  of  the  bowl  may  be 
moving,  and    the  orientation  and  curvature  of  the  bowl  may  be 
changing.   The  adaptive  process  has  to  track  the  bottom  of  the 
bowl  when  inputs  are  nonstationary .   The  method  of  steepest 
descent  uses  the  gradient  of  the  performance  surface  in  seeking  its 
minimum.   The  gradient  at  any  point  on  the  performance  surface 
may  be  obtained  by  differentiating  the  mean-square  error  function 
of  Equation  (3-5)  with  respect  to  the  weight  vector.   The 
gradient  is 

V{E(e2(n))}=   "2  Rxd  +  2  Rxx  W  (3-10) 

To  find  the  "optimal"  weight  vector  WTMq,  .i.e.  the  one  that  yields  the 
least  mean  square  error,  set  the  gradient  to  zero.   Accordingly, 

Rxd  =   Rxx  WLMS 

W    =   R   _1  R  (3_11) 

LMS    Rxx    Rxd 

Equation  (3-11)  is  known  as  the  Wiener-Hopf  equation  in 
matrix  form. 
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Then  the  minimum  mean  square  error  may  be  obtained  by 
substituting  (3-11)  into  (3-7) 

E  [e2(n)]  min  =  a"2  (n)  -  w£MS  Rxd  (3-12) 

3.   LMS  Algorithm 

In  seeking  the  minimum  mean-square  error  by  the 
method  of  steepest  descent  ,  one  begins  with  an  initial 
guess  as  to  where  the  minimum  point  of  the  mean-square  error 
surface  may  be.   This  means  that  one  begins  with  a  set  of  initial 
conditions  for  the  weights.   The  gradient  vector  is  then  mea- 
sured, and  the  next  guess  is  obtained  from  the  present  guess 
by  making  a  change  in  the  weight  vector  in  the  direction  of  the 
negative  of  the  gradient  vector.   The  method  of  steepest  descent 
can  thus  be  described  by  the  following  relation 

W(n+1)  =  W(n)  +  kV[E(e2(n))]  (3-13) 

2 

The  expression  for   V[E(e  (n)  )]  is  obtained  by  using 

Equation  (3-10) . 

W(n+1)  =  W(n)  +  2kRxxW    -  2k  Rxd         (3-14) 

2 
The  gradient  vector   V[E(e  (n))]  is  the  gradient  of  the  expectation 

of  the  squared  error  function  when  the  weight  vector  is  W(n) . 

When  the  performance  function  is  quadratic,  the  gradient 
is  a  linear  function  of  the  weights.   The  advantage  of  working 
with  the  quadratic  performance  surface  lies  both  in  this  linear 
relation  and  in  the  fact  that  such  a  surface  has  a  unique- 
minimum  . 
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The  purpose  of  the  adaptation  process  is  to  find  an  exact  or 
an  approximate  solution  to  the  Wiener -Hoff  equation  (3-11) . 
One  way  of  finding  the  optimum  weight  vector  is  simply  to  solve 
(3-10) .   Although  this  solution  is  generally  straight  forward, 
it  could  present  serious  computational  problems  when  the  num- 
ber of  weights  N  is  large  and  when  input  data  rates  are  high. 
In  addition  to  the  necessity  of  inverting  an  H  x  N  matrix,  this 
method  may  require  as  many  as  j\l(N+l)/2  autocorrelation  and  cross 
correlation  measurements  to  be  made  to  obtain  the  elements  of 

^x'  Rxd* 

No  perfect  solution  of  equation  (3-11)  is  possible  in  prac- 
tice to  estimate  perfectly  the  elements  of  the  correlation 
matrices. 

A  method  for  finding  approximation  solutions  to  (3-11)  is 
presented  below.   The  accuracy  of  this  method  is  limited  by 
statistical  sample  size,  since  weight  values  are  found  that 
are  based  on  finite-time  measurements  of  input-data  signals. 

This  method  does  not-  require  explicit  measurements  of 
correlation  functions,  nor  does  it  require  matrix  inversion. 
It  is  the  "LMS"  algorithm  based  on  the  steepest  descent  method. 
This  algorithm  does  not  even  require  squaring,  averaging,  or 
differentiation  in  order  to  make  use  of  gradients  of  mean- 
square  error  functions. 

When  using  the  LMS  algorithm,  changes  in  the  weight  vector 
are  made  along  the  direction  of  the  estimated  gradient  vector. 
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Accordingly 

_       Ar     2 
W(n+1)  =  W(n)  +  kV{ [E(e^(h) )] }  (3-13) 


Where 


W(n)  =  weight  vector  before  adaptation 

W(n+1)  =  weight  vector  after  adaptation 

k      =  scalar  constant  controlling  rate  of  convergence 


A     2,_,„  ^  ,.      c   „r_2 


and  stability  (k<o) 
V [E (e^ (n) ) ] =  estimate  of  gradient  of  Ete^Cn)]  with  respect 
to  W  with  W  =  W(n) 
One  method  for  obtaining  the  estimated  gradient  of  the 
mean  square  error  function  is  to  take  the  gradient  of  a  single 
time  sample  of  the  squared  error;  that  is 

V[E(e2(n)]  =  V[e2(n)]  =  2e(n)  V[e(n)]       (3-14) 
From  Equation  (3-4) 

V[e(n)]  =  V[y(n)-d(n)]  =V [WT(n) X (n) -d (n) ] 


=  X(n)  (3-15) 


Thus, 


A     2 

V[E(ez(n))]  =  2e(n)X(n) 


=  2[WT(n)X(n)-d(n)]X(n)         (3-16) 


The  gradient  estimate  of  (3-16)  is  unbiased,  as  will  be  shown 
by  the  following  argument:   For  a  given  weight  vector  W(n)  ,  the 
expected  value  of  the  gradient  estimate  is : 

E{V  [E(e2(n))]}=  2E{ [WT (n) X(n) -d (n) ] X (n) } 

"  2V"  2Rxd  <3-17> 
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Comparing  (3-17)  and  (3-10) ,  we  see  that 
E{  V[E(e2(n))] }  =  V  E  [e2 (n) ] 


(3-18) 


and  therefore  for  a  given  weight  vector,  the  gradient  estimate 

A     2 

V[E(e  (n) ) ]  is  unbiased. 

Using  the  gradient  estimation  formula  (3-16) ,  the  weight 
iteration  rule  Equation  (3-13)  becomes 

W(n+1)  =  W(n)  +  2k  e(n)  X  (n)  (3-19) 

and  the  next  weight  vector  is  obtained  by  adding  to  the  present 
weight  vector  scaled  by  the  value  of  error.   This  is  the  LMS 
algorithm.   Looking  at  Equation  (3-19) ,  the  adaptation  process 
is  a  simple  first  order  recursion  equation  which  can  be  realized 
as  shown  below. 


X(n) 


2kX(n) 


W(n+1) 


UNIT 
DELAY 


"W7 


-J 


WEIGHT 
SETTING 


e(n) 
FIGURE  -   3-2    FILTER  COEFFICIENT  UPDATING 
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Tothis point,  the      basic   concept  of   LMS   nonrecursive   adaptive    filter 
has  been  introduced  and   reviewed.      More  details   can  be   seen   in    [1]  . 
Widrow    [1]    showed   that   the  weight  vector  mean  converges   to   the 
Wiener    solution  aidthatthe  boundsof  the  step   size   k   should   be    in   the 

region  such  that 

1 


A 
max 


<  k  <   o   for  the  stability  and  convergence, 


where  X  is  the  maximum  eigenvalue  of  R 

max  xx 

4 .   Two-dimensional  adaptive  filter. 
a.   Adaptive  filter  structure 

The  input-output  relation  of  the  two-dimensional 
filter  is  given  by  two-dimensional  convolution. 

P       *3 
y(k,Jt)      l  E  •    wi.  x(k-i,  £-j),   (3-20) 

i=o     j=o       J 

where     y    (k,    I)    is  the  filter  output 

and  W.  .  is  the  fiiite impulse   response  of    filter. 

Here,    it   is  assumed  that  the  input.   X(k,£)    is   a    stationary   random 

field. 

In  Equation  (3-20) ,  a  set  of  two-dimensional 
stationary  input  signals  is  weighted  and  summed  to  form  an  out- 
put signal  and  the  filter  output  is  intended  to  match  a  desired 
(reference)  signal  in  accordance  with  the  minimization  of  mean 
squared  error,  where  the  error  is  the  difference  between  filter 

output  and  desired  signal. 

A 
e(k,£)  =  y(k,£  )  -  d(k,£).  (3-21) 
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Introducing  the  vector  notation  such  that 


and 


oo  ol 


WT 


%  W10  Wll  -•■  Wlq  W20 


,W   ] 

pq 


X±   =  [x(k,JD,  x(k,£-l),  .  .  .x(k,i-q)x(k-l,A)x(k-lfA-l)  .  .  .x(k-l,£-q) 

x(k-2,£)  x(k-p,£  -q)]       (3-22) 

then  Equation (3-20) can  be  written  in  matrix  form. 

y(k,£)  =  ^Fx   =  X^W  (3-23) 


where  W  is  a  weight  vector  of  dimension  (p+1) (q+l)xl 

X  is  a  input  signal  vector  of  dimension  (p+1) (q+l)xl 
The  weight  vector  of  the  filter  is  supposed  to  be  adjusted  in 
the  direction  such  that  performance  criterion  (mean  square  error) 
is  to  be  minimized.   Thus,  the  linear  combinatorial  system 
in  Equation  (3-20)  will  be  given  with  variable  weights. 


x(k,l) 


yOt.D 


HCOM   3-3      LIHEAB  COMBINATORIAI-  -SYSTEM    itf 
TWO-DIMENSIONAL  ADAPTIVE  flLTER 
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In  Figure  3-3,  the  linear  combinatorial  structure  is  given.   Then, 
the  adaptive  nonrecursive  filter  can  be  drawn  as  following: 


r 


x(k,*) 


L 


LINEAR  COMBINATORIAL 
OF  FIGURE  3-3 


Input  I WITH  VARIABLE  WEIGHT 
Stationary  f 
Random 
Input 


7 


y(k,£) 


filter  output 
3— V 


e(k,£) 


FIGURE  3-4   STRUCTURE  OF  NONRECURSIVE  ADAPTIVE 
FILTER 


b.   Wiener  solution 

From  equation  (3-22)  and  (3-23),  the  error  signal 


can  be  written  by 

e(k,Jt)  =  W^X  -  d(k,£) 
The  square  of  this  error  is 


(3-24) 


e2(k,£)  =  WTXXTW  -  2d(k,£)  XTW   +  d2(k,£)   (3-25) 


E  [e2  (k,  £)  ]  =E  [d2  (k,  £)  ]  -SR^W^+W^R^W 


The  expected  value  of  e  (k,£)  is 

(3-26) 

where  the  vector  of  cross  correlations  between  the  input 
signal  and  desired  response  is  defined  as 
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'd(k,£)x(k,   £) 
d(k,  £)x(k,  £-1) 


E[d(k,  £)X]    =   E 


d(k,  £)x(k,  £-q) 
d(k,  £)x(k-l,  £) 
d(k,£)x(k-l,£-q) 


V 


d(k,£)x(k-p, £-q) 


=AR 


xd 


(3-28) 


and  where   the   correlation  matrix   of   the    input   signals    is 
defined   as 


E[X3T]    = 
/ 
xlxl      xl*2     xf3 xlX(p+q)  (q+D 

X2X1      X2X2     X2X3 X2X(P+D  (q+D 

x3'xl      X3X2     X3X3 *3x(p+D  (q+D 


> 


(3-29) 


*P+1 )  q+l)Xtl^p+l)  q+2) x  2 X(.p  tj)  q+]) x  (pf]}  q+| 


=RXX,    where  x(p4])(q+^x(k-p/£-q) 
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The  gradient  at  any  point  on  the  performance  surface  can  be 
obtained  by  differentiating  the  mean  square  error  function  of 
equation  (6) 

V[E(e2(k,*)  )]    =     3  E(e2(k,t) 


3  w 

■   "2Rxd+  2Rxx™ 
To  find  the  "optimal"  weight  vector  W  MS  that  yield  the  least 
mean  square  error,  set  the  gradient  to  zero.   Accordingly, 

Rxd     "    V 


WLMS        Rxx    Rxd  (3-30) 


Equation  (7)  is  the  Wiener  Hopf  equation  in  matrix  form,  again 

the  minimum  mean  square  error  is     obtained  by  substituting 
(3-30)  into  (3-26) . 

E[e2(k,£)]min  =  E[d2(k,£)]  -  WLMST  Rxd      (3-31) 


c.   LMS  Algorithm 
Consider  a  two-dimensional  field  x(k,£)  to  be  processed 
(usually  two-dimensional  filters  are       used  in  process- 
ingdiscrete  two-dimensional  image  fields)  and  assume  that  the 
two-dimensional  field  consists  of  NxN  discrete  points  (which  may  be  a 
sensed  signal  by  NxN  pixel  elements  of  a  sensor)  .    The  adaptation 
processes  is  that  of  adjusting  the  filter  coefficient  W- •  in 
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accordance  of  minimization  of  the  mean  square  error  as  in  the  one- 
dimensional  case.   The  adaptation  scheme  may  be  predetermined  as 
being    columnwise  scanning  or  diagonal,  or  row-wise  scanning. 
Here  the  row  scanning  process  is  adopted  as  shown  in  Figure  3-5. 


p  •  •  §  m 


li  • 
o  • 


k-p •*» 


Region  for  jfc^ 
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p  q 
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o  o 


e(k,i)  - 

■  e(1) \   •   • 

•  •  • 

•  •  •  • 
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•  • 


Region  for(j+l) 
adaptation 


^n 


•    <  »l  » 
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FIGURE  3-5  ADAPTATION  SCHEME 
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Therefore ,theNxNadaptation  processes  will  be  required  to  complete 
the  filtering  of  anNxN  two-dimensional  field.   Using  e(k,£) 
to  denote      '  the  error  at  the  jth  iteration  (then  e(k+l,£)  is 
the  (j+l)th  error),  then  the  filter  coefficient  updating  process 
can  be  described  by 

W(j+1)  =  W(j)  +  uV[E(e2(k,£)] 

where  W(j+1)  =  coefficient  vector  after  adaptation 
W  (j)   =  coefficient  vector  before  adaptation 
u      =  negative  scalar  constant  controlling  rate 
of  convergence  and  stability. 
The  gradient  of  tie  mean  square  error  is  to  be  estimated  by 

V[E(e2(k,£)]  =  V[e2(k,£)] 
where  e(k,£)  =  y(k,£)  -  d(k,£) 
then  W(j+1)  =  W(j)  -  ue (k, I) V  [e(k, I) ] 
=  W(j)  -  2ue(k,£)X 
where  X  is  defined  by  equation  (3-22) . 
Along  with  y(k,l)    =   W^ 

and  e(k.£)  =  y(k,£)  -  d{k,l) , 
the  LMS  algorithm  will  be  completed. 

B.   RECURSIVE  FILTER 
1.   Introduction 

In  the  previous  section,  it  is  shown  that  adaptive  non- 
recursive  filters  have  a  finite  impulse  response;  that  is,  they 
can  produce  only  zeros  with  no  poles  in  the  filter  transfer 
function.   This  limits  the  capability  of  transversal  adaptive 
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filters  in  many  applications.   To  overcome  this  limitation  a 
new  adaptive  filter  structure  is  described  which  is  capable  of 
producing  poles  in  the  transfer  function.   The  basic  configura- 
tion considered  here  is  quite  standard;  that  is,  the  present 
output  sample  of  the  filter  y(n)  is  a  linear  combination  of  the 
present  and  past  samples  of  the  input  x(n),  x(n-l),  .  .x(n-M) 
and  the  past  samples  of  the  output,  y(n-l),  y(n-2).  .  .  y(n-N). 
The  present  output  sample  of  the  filter  is  compared  against  a 
reference  sample.   The  resulting  error  samples  are  used  to  adjust 
the  filter  parameters,  feed  forward  gains  and  feed  back  gains  to 
minimize  some  error  f  unction. TheQie-dimensional  recursive  filter 
is  developed  first,  then  it  is  extended  to  the  two-dimensional 
adaptive  filter. 

Recently  Feintuck  [2]  and  White  [3]  have  proposed  a  technique 
for  making  digital  filters  with  zeros  and  poles  adaptive.   This 
development  may  enhance  the  possiblity  of  obtaining  accurate 
models  for  unknown  systems.  The  new  approach  is  developed  into  an 
algorithm.   It  employs  the  steepest-descent  criterion  for  para- 
meter adjustment  but  it  differs  in  the  estimation  of  mean 
squared  error  gradient  vector  from  Feintuck  [2]  and  Widrow  [1] . 

2 .   One-Dimensional  Adaptive  Recursive  Filter 
a.   Structure 

The  recursive  filter  is  described  by  its  transfer 
function 
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M 
Y(Z)  i=o 


Z         bi      Z 


X(Z)  1   +  I       aL    Z 

i=l  (3-32) 

In  the   time  domain,    the    input-output   relation  of   the  digital 

filter    is   given  by 

M  N 

y(n)    =        I    b±x(n-i)-   I      a.y(n-i)  (3-33) 

i=o  i=l 

where  y(n)  =  nth  sample  of  the  filter  output 

x(n)  =  nth  sample  of  the  filter  input 

a.  =  feedback   coefficients  i  =  1/2 f.  N 

b.  =  feed  forward  coefficients  i=0,l,2..N 

The  output  samples  of  the  filter  are  intended  to  match  those  of 
a  reference  (or  desired)  signal  d(n).       in  accordance  with 
the  minimization  of  some  error  criterion,  the  filter  parameters 
a.,  b.  will  be  adjusted  at  every  iteration.  Thegsneral  scheme  of  the 
adaptive  recursive  structure  is  given  in  Figure  3-6.   The  two 
finite  length  transversal  filters  areusedinthe  forward  path  and 
feedback  path  to  form  the  recursive  filter  of  Equation  (3-33). 
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d(n) 
reference  signal 


Input  Signal 


x(n) 


FIGURE  3-6  ADAPTIVE  RECURSIVE  LMS  FILTER 
USING  TWO  TRANSVERSAL  ADAPTIVE 
FILTERS 


b.   Problem  in  Wiener  Solution 

Introducing  vector  notation  for  the  signals  and  sets 
of  filter  coefficients  we  have 


*-T 


A  -  [  a^,  a.^,    .  .  .  .  aN] 


ttT 


B  =  [  bo'  bl bMl 

X(n)   =  [  x(n),  x(n-l)  .  ...  x(n-M)] 
Y(n)T  =  [  y(n-l)f  y(n-2)  .  .  .  y(n-N)] 
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Then     Equation  (3-33)  can  be  written  as 

y(n)  =  B^Xdi)  -AT  Y(n)  (3-34) 

where  A  is  the  feedback  coefficient  vector  (nxl) 

Bis  the  feed  forward  coefficient  vector [ (M+l) xl] 
X  (n)is  the  iiput  signal  vector  at  nth  iteration  [(M+lxl] 
Y  (n)isths output  signal  vector  at  nth  iteration  (Nxl) 
The  performance  criterion  is  again  minimum  mean  squared  error, 
where  tte  error  is  the  difference  between  filter  output  and  desired 
signal  (reference  signal) . 

That  is,  the  filter  is  used  to  estimate  a  desired  waveform 
d(n)  in  a  minimum  mean  square  error  sense.   Assume  that  the 
observables  are  stationary  and  zero  mean  and  let  e(n)  denote  the 
error  waveform  at  nth  sample,  then 

e(n)  A  y(n)-d(n)  (3-35) 

=  BTX(n)-ATY(n)  -d(n) 
and  the  mean  square  error  is 

E[e2(n)]  =  E[(Brx(n)-ATY(n)-d(n)  )  2] 

=  E [BTX (n) X(n)TB-2BTX(n) Y^n) A+ATY(n) YTA 

-2BTd(n)X(n)+2ATd(n)Y(n)+d2 (n) ] 
=   E  [d2 (n) ] +BTRxxB+ATRyyA-2BrRxyA 

"2lTRdx+2lTRdy  <3"36) 

where   R^   =   E[X(n)XT(n)] 

Ryy    =   E[Y(n)YT(n)] 

Rdx    =   Etd(n)^(n)l 
Rdy    =   E[d(n)Y(n)] 
and        ^   =   E[X(n)YT(n)] 
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The  theory  of  Wiener  filtering  employs  known  second-order  input 
statistics  to  dictate  the  impulse  response  of  the  linear  filter 
that  minimizes  the  mean  square  error;  that  is,  as  in  the  previous 
section,  the  knowledge  of  second  order  statistics  R,  ,  R   is 
assumed  to  calculate  the  optimum  impulse  response 
(optimum  weight  vector  in  nonrecursive  adaptive  filter)  Wn 
But  here  in  the  recursive  algorithm,  it  is  also  required  that 
the  autocorrelation  of  the  output,  R   ,  and  the  cross  correlation 
of  the  output  and  the  input,  R  ,  and  the  cross  correlation  of  the 
output  with  the  desired  waveform, R,  ,  should  be  assumed  known. 
Thus,  the  set  of  statistics  mentioned  above  is  assumed  to  be  known 
for  a  moment,  and  will  be  used  to  determine  the  weights  in 
the  recursive  filter.         The  statistics  for  the  fixed  para- 
meter filter  are  not  a  function  of  these  statistics,  but  instead 

the  weights  are  a  function  of  these  statistics.   Therefore,  R„.., 

xy 

R,   and  R   are  to  be  considered  constant  when  the  differentiation 
ax      yy 

is  made  with  respect  to  A  and  B. 

The  set  of  weights  (filter  coefficient  vectors)  which  minimize 
the  mean  square  error  can  be  found  by  getting  the  gradient  vector 
with  respect  to  filter  parameters  equal  to  zero. 


9E(e2(n)] 

9  A 


=   2R   A  -  2  R  1  +  2  R,   =0 
yy       xy       dy 

R  ~1  (R,   -  R   "B)  (3-37) 

yy     dy    xy 


and 


9  E[e2(n)]    A    Vr^/  2 


9  B 


3[E(e  (n))] 

=  2R  1  -  2R  A  -  2  R, 
xx      xy       dx 


B 


*xx-l  (Rdx  +  *xyX)  (3"38) 


Thus,  one  can  solve  for  the  filter  coefficients  if  all  the 
second  order  statistics  are  known.   But  without  knowing  the 

impulse  response  of  the  filter,  the  R   ,  R,   and  R  can  not  be 

r  e  xy   dy      yy 

calculated  with  only  input  and  reference  signal  statistics. 
Noting  that  we  are  looking  for  the  impulse  response  which 
minimizes  the  mean  square  error  in  some  way,  it  is  clear  that 
R   ,  Rd  ,  R   are  not  available  and  so  the  wiener  approach  is 
not  feasible. 

C.   LMS  ALGORITHM  . 

An  iterative  gradient  search  technique  (the  method  of  steepest 
descent)  is  revisited.    Here,  in  the  recursive  algorithm,  it 
updates  the  filter  coefficients  with  steps  proportional  to  the 
gradient  vector.   This  updating  process  is 

A(n+1)  =  A(n)  +  k  AA 

3. 

=  A(n)  +  k_VA[E(e2(n))]  (3-39) 

B(n+1)  =  B(n)  +  kbAB 


=  B(n)  +  kbvB[E(e2(n)] 


where 


A(n) ,  B(n)   =   filter  coefficient  vectors  before  adaptation 
A(n+1)  B(n+1)  =  filter  coefficient  vectois  after  adaptation 
k  ,  k,    =  scalar  constants  controlling  rate  of  convergence 


and  stability  (k  ,  k\<o) 
2        a   b 

gradie 

to  A  and  B  respectively. 


VA[E(e  (n) ) ] ,  V  [E(e  (n) ) ]  =  gradient  vectors  with  respect 
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The  updating  process  (3-39)  can  be  considered  as  afirst  order  fil- 
tering process  with  an  input  proportional  to  the  gradient  vector. 

2  2 

But  the  gradients  VA.[E  (e  (n)  )  ]  and  VB[E(e  (n)  )  ]  should  be  esti- 
mated because  the  output  statistics  are  not  available  apriori  or 
an  infinite  statistical  sample  would  be  required  to  estimate 
perfectly  the  elements  of  the  correlation  matrices  in  Equations 
(3-37) and  (3-38)  .   A  method  of  estimating  these  gradients  will 
be  presented. 

Widrow  [1]  obtained  the  estimated  gradient  of  the  mean  square 
error  function  by  taking  the  gradient  of  a  single  time  sample  of 
the  squared  error  (instantaneous  estimates)  when  he  discussed 
the  nonrecursive  adaptive  filter  (see  previous  section) . 

Here,  in  this  thesis  work,   a  new  method  of  estimating 

gradients  is  proposed.   This  is  to  approximate  the  mean  squared 

2 
error  E(e  (n)  ]  by  an  average  of  a  finite  number  of  points  at  every  iteration 

and  take  the  gradient  of  this  instead  of  taking  the  instantaneous 

error  square,  that  is,  the  approximation  used  is 

9        1  L-l  0 
E(eMn))~  —  I  e  (n-£)     (3-40) 
L  £  =  0 

2 
For  E(e    (n)),    the  average  of  the  square  error  for  the  previous  L  points 

is   taken  and   then  gradient   is   evaluated  for  the  approximate     mean 

square   error.      The   estimated   gradient  of  mean   square   error    is 

V.E[e2(n)]    =V,[    Lf1e2(n-£)] 
A  A      £  =  0 

VRE[e2(n)]    =VR[    L:1e2(n-£)]  (3-41) 

B  B      £^0 
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For  convenience,the£l)  term  in  (3-4  0)  was  dropped, 
vector  notation  for  error  signal 
e(n)  =  f  e(n) 

e(n-l) 


Introducing  the 


[    e(n-L+l) 

then  it  is  seen  that  the  error  signal  vector  is  an   (Lxl)  vector.     The 
estimated   gradient    (Equation  (3-41) )  can   be   put    into  the  matrix  form 

BLl~    x"' j         vb 

Substituting  the  estimated  gradients  in  Equation  (3-39) ,  the  updating 
process  for  the  filter  coefficients  is: 

^aVA 


VAE[e2(n)]    =   VA[    eT(n)    J (n) ] 
VRE[e2(n)]    =   Vn[    eT(n)    e(n)] 


A(n+1)    =  A(n)    +  KaV,  [£T(n)e(n)] 


and 


B 


(n+1)    =B~(n)    +   KbVB[eT(n)e  (n)  ] 


(3-42) 


The  function  e    (n)    e(n)    is  a  scalar  function  of  the  coefficient 
vectors  A  and  B,    that    is, 

TT(n)T(n)    =   f(A,    B) 
Therefore    ,    by  definition,    the   gradient   of    f (A,    B)    with  respect 
to  A  and   B      is 
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21 

3A 


3f 
"5"a. 

3f 


3a. 


3f 


3f 


3a 


N 


3B 


3f 

TO 
o 

3f 


3b. 


3f 


3b 


M 


It  follows  that 


5  A[E(e2(n)H    =      VA[FT(n)i(n)] 


* 


d  (t^  (n)I (n)  ) 
— 3~a~: 


3(FT(n)e(n)) 
3a. 


3(£T(n)e(n)    )    I 
I  3aN  J 


/ 


27(5)   _L*M 
al 


T  3e (n) 

2£  (n) 


3a. 


27(5)       9e(n) 


^a 


(3-43) 


N  / 


and 


VB(E(e2(n)    ))    =   VB(£(n)T£(n)    ) 


3(£(n)T   £    (n) 

3b„ 


3(£(n)T   £(n) 
3b, 


3(£(n)T£(n) 


3  b 


N 


r 

n) 

o 


2   F(nT     2£^gl 


2   F(n)T     9e(n) 


3b- 


2e(n) 


T     3e(n) 


y 


v 


3b 


N 


(3-44) 
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Consider  the  terms   3e (n) ,  3e (n)   in  equation  (3-43) , (3-44)  , 

8ar,        9b~ 

p     q 

where  p=l,2,...N  and  q  =  0,1,2,...M. 

Since   e (n)    and   e(n)    are  defined   as 

e(n)    A   y(n)    -  d(n) 

—        T 

e (n)    =    (e(n),    e(n-l),    .    .    .    e(n-L+l), 


3e(n) 

7a 


/ 


3e(n) 

~~ 51 

P 

3e(n-l) 

3a 


3e(n-L+l) 

3~£ 


3y(n) 

3a 
P 

3y(n-l) 

3a_ 


3y(n-L+l) 


3a 


p=l, 2 , . . .N 


(3-45) 


and 


3gpl)   _ 
3ba 


r3e(n) 

~T5q 

3e(n+l) 


3    bq 


3e(n-L+l) 
3    bq 


3y(n) 

3bq 

3y(n-l) 

3bq 

• 

3y(n-L+l) 

I      3bq 

q=0,l,2,...M       (3-46) 


Note   that3e(n)    and    3e;(n)      are    (Lxl)    vectors 


3ap 


3bq 


and    e (n)    is  an    (lxL)    vector. 
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Therefore,  V^[E(e2(n))]  and  VB[E(e2(n)  )]  in  Equation  (3-41) 

are      (Nxl)  and  [(M+l)xl]  vectors  respectively. 

Equations (3-45)  and  (3-46)  may  be  considered  as    sensitivity 

vectors  which  tell       how  much  the  change  in  clp  and  bq  affect 

the  outputs  y(n),  y(n-l)  .  .  .  y(n-L+l) . 

To  calculate  the  elements  of  the  estimated  gradients  of  mean  square 

error  in  equation  (3-41) ,  we  should  calculate  the  sensitivity 

vector  of  equation  (3-45)  and  (3-46)  first. 

From  the  recursive  equation  (3-33) , 

9y(n)     a   ,    M  N  , 

=  —     I  b.X(n-i)  -  I   A.y(n-i) 

3ap      32lp  l  i=o   1  i=l  X        I 


=  -  y(n-p)  -   I  A.  ay(n-i}  (3-47) 

i=i  1      3  aP 


p  =  1/  2,  .  .  .  N 


lif-  -  *<»-<>  -A  ^^ 


q  =  0,  1,  2,     ...  M  (3-48) 

The  sensitivity  vector  components  ■  given  by  Equations    (3-47) and 

(3-48)  can  be  interpreted  as  being  the  response  of  a  linear  system 

with  transfer  function. 

1  (3-49) 

H(Z)  =   l+a,z  X  +  a_  z"2+  .  .  .  aMz-N 

Henceforthfthis  will  be  called. the  "sensitivity  filter."   Equation 
(3-49)  is  ai  all  pole  filter  (recursive  filter)  with  input  signals 
[-y(n-p)]  and  [x(n-q)]  respectively. 
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Now,  what    are    the  initial  conditions  characterizing  the  re- 
cursive relationships  of  the  sensitivity  filter? 

x(n-q)  is  the  q  time  units  delayed  signal  of  input  x(n)  to  the 
adaptive  recursive  filter  of  Equation  (3-33) and  y(n-p)  is  the  p 
time  units  delayed  signal  of  1he  output  y(n)  of  the  recursive 
filter. 
From  x(n)  =  o 

y(n)  =  0    for  n<  o  , 
x(n-q)  and  y(n-p)  are  sequences  with  the  first  q  elanents  and  first  p 
elements  zero  respectively. And  since  changes  in  the  ap  and  bq  coeffici- 
ents have  no  effects  on  the  system's  response  until  n=p  and  n=q 
respectively,  it  follows  that  the  initial  conditions  are: 

gip    -  o    for  n=o,  1  ...  p-1 
a^n)   =  o     for  n=o,  1  ...  q-1 

A  summary  of  this  algorithm  is 
the  following : 

1.  Calculate  the  sensitivity  vector  components  through  the 
sensitivity  recursive  filter  by  equations  (3-47)  and  (3-48) . 

2.  Calculate  the  estimated  gradient  by  equation  (3-43)  and  (3-44). 

3.  Calculate  the  filter  coefficient  vector  by  equation  (3-42). 

4.  Calculate  the  filter  output  by  equation  (3-34) 

5.  Form  the  e(n)  vector  i~(n) ,  then  go  back  to  the  1st  step. 

Note  that  due  to  the  fact  that  the  gradient  of  finite  point  square 
error  average  is  used  for  the  estimation  of  the  true  gradient  of 
mean  square,  this  filter   cannot   give  an  optimal  solution,  but  the 
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more  averaging  points  are  used;  the  better  performance  is  expected 
It  shoutl  be  noted  that  if  L=l,  that  is, 


VA[E(e2(n))]  =  VA[e2(n)] 

VB[E(e2(n))]  =  VB[e2(n)]  , 
this  corresponds  to  using  the   instantaneous  error  square  for 
estimatingthe  gradient,  and  this  filter  reduce  s  to  the  adaptive 
recursive  filter  proposed  by  White  [3] . If  the  further  approxima- 
tion is  made      that  the  sensitivity  components  of  equation 
(3-47) ,  (3-48)  are 

3y(n)      /   x 

-4-^ — -     =   -y(n-p) 

3ap      ■*        tr' 


3y(n) 

3bq 


=  x(n-q) 


then   the   estimated   gradient    is 


VA[E(e    (n)    )]    =   -    2    e(n) 


y(n-l) 
y(n-2) 


"> 


and 


VB[E(eA(n))]    = 


2    e    (n) 


y(n-N) 


x(n) 
x(n-l) 


x(n-N) 
J 


=    -2e(n)Y 


(3-50) 


=  2e(n)X 


(3-51) 
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Then  the  filter  coefficient  updating  process  is 

A(n+1)  =  A(n)-2k  e(n)Y 

a. 

B(n+1)  =  B(n)+2kben)X  (3-52) 

where  k  ,  k<  0  and  filter  output 

y(n)  =  B(n)TX-AT(n)Y  (3-53) 

Equations  (3-53) ,  (3-52) ,  (3-51) ,  (3-50)  are  exactly  the  same  as 
the  algorithm  proposed  by  Feintuck  [2] .   This  Feintuck  algorithm 
has  an  advantage  in  simplicity  when  compared  with  the  algorithms 
proposed  by  White  [3]  and  proposed  here  which  require  additional 
recursive  filters  to  generate  the  estimates  of  the  gradient. 
Thus,  it  may  be  useful  to  extend  the  Feintuck  algorithm  to  the 
two-dimensional  recursive  adaptive  filter  for  simplicity.   In 
the  next  section,  the  algorithms  proposed  by  White  and  Feintuck 
are  extended  to  the  two-dimensional  algorithm. 

3 .   Two-dimensional  Recursive  Adaptive  Filter 

In  this  section,  a  mathematical  model  of  the  adaptive 
recursive  filter  for  the  processing  of  two-dimensional  signals 
is  proposed.   This  can  be  considered  as  an  extension  of  Fein- 
tuck's  algorithm  to  two-dimensional  filters. 

Two  transversal  filters  having  the  same  structure  as  the 
linear  combinatorial  system  used  in  the  non-recursive  two- 
dimensional  processor,  are  used  in  the  recursive  processor, 
one  for  the  feedforward  path  and  one  for  the  feedback  path. 

The  two-dimensional  recursive  filter  is  described  by 
its  transfer  function. 


72 


y  (v  z2'  ■  ,L    ?  bijzriz2"j 

1=0      3=0      j 

X    (z1/    z„)  . 

12  n  M      N  -in      -n 

L    _      mn   1        Z 
m=o   n=0 

(m,n)^(0,0) 
In  the    spatial   domain,    the    input-output   relation  of  thedigital    fil- 
ter   is   given   by 

p  q  M         N 

y(kf£)  I  E        b- .x(k-i,£-j)    -     E        Z        a     v  (k-m,£-n) 

i=0        j  =  0  J  n=0   n=0        mn 


(3-54) 


The  following  notation   is   introduced 


B   =  [b    bn,  ....b   b,  _  b, ,  .  .  .  .bn   b_ft b  J 

00    01       oq   10   11       lq   20         pq 

XT  =  [x(k,£)  ,x(k,£-l)  ,  .  .x(k,£-q)x(k-l,£)x(k-l,£-l)  .  .  .x(k-l,£-q) 

x(k-2,  £ x(k-p,  £-q)  ] 

ip 

and  A  =  [a^  a  QT  .  .a^  a  1(a  ir  .  .a1Na20 ^ 

YT=Cy(k,£-l)  ,  y(k,£-2)  ..  .y(k,£-N)  y  (k-l,£  )  y  (k-l,£-l)  ..  .y(k-l,£-N) 

y(k-2,  £) y(k-M,£-N)  J 

The  filter  coefficient  vectors  ~A~ and ~B~  are   [ (M+l)  (N+l) -1] xl 
and  (p+1) (q+l)xl,  respectively,  and  the  input-output  signal  vectors 
are       again  (p+l)(q+l)xl  and  [  (M+l)  (N+l) -l]xl,  respectively . 
Then  equation  (22)  can  be  written  as 

y(kf£)  =  BTX-ATY  (3-55) 

Here,  to  obtain  an  estimate  of  gradients  of  the  mean  square  error 
function,  a  single  sample  of  the  square  error  is  taken.   That  is: 

V[ECe  (k,£))]  =Vte  (k,£)  ]  =2e  (k,£)  V  [e  (k,£  )  ]  ,  and  again  the 
adaptation  scheme  (filter  coefficient  updating  process)  is  used 
in    the  same  fashion  as  in  the  nonrecursive  case  [see  Figure  3-*5]  . 
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Denoting  the 


error  at  jth  iteration  as  e(k,£)  then 


A(j+1)  =  A(j)  +  2kae(j)VA(e(j)] 

B  (j+1)  =  B(j)  +  2kbe(j)VA(e(j)] 
where 

e(k,£)  =  e(j)  =  y(k,£)  -  d(k,£) 


(3-56) 


(3-57) 


The   components cfthevectorsv, [e (k,£) ]    andv„  [e  (k,£>  ]    can   be 


B 


calculated  as  following. 
From  Equation  (3-57) 

VA[e(k,£)]  =VA[y(k,£)] 


3y(k,£) 
3a01 


3y(k,£)   3y(k,£)  . .3y(k,£)  3y(k,£) 


3a 


OM 


3a 


10 

3y(k,£) 


3a 


3a 


MN 


1M 


(3-58) 


3a 


20 


and  VB[e(k,£)]  =Vg[y(k,£)] 


3y(k,l)  ...  3y(k,£)   3y(k,£) 
oo        oq        10 


3y(k,£)   3y(k,£) 


3b" 


3y(k,£) 


3b 


pq 


iq 


(3-59) 


TB 


20 


Note  thatV.  [e  (k,  £)  ]  and  V_  [e  (k,£)  ]  havethe  same  dimensions  as  A"  and 

B,  that  is: 

[  (M+l)  (N+D-l]    x    1   and    [  (p+1)  (q+1)  ] /respectively . 

From  the   recursive   relation  of    Equation    (3-54) 


3y(k,£)                 ri         n       \           r        v             ~y(k-m,£-n) 
Ut— ' y(k-r.t-s)    -     I        I  a        3^        'J- 

rs  m=o  n=o  rs 


and   3y(k,£)      =   x(k-u,y-v)    - 


M 


N 


Tb~ 


3    y(k-m,£-n) 


uv 


m=o   n=o     mn 
(m,n)^C0f0) 


uv 


(3-60) 


(3-61) 


74 


The  recursive  relationship  of  Equations  (3-60)  and  (3-61)  should 
be  noted    again,  which  can  be  implemented  by  additional  recur- 
sive filters. 

Forming  the  instantaneous  error  gradient  of  Equations 
(3-54) ,  (3-55)  using  the  output  of  additional  recursive  filters 
of  Equations  (3-60)  (3-61) ,  the  filter  coefficient  adaptation 
process  of  Equations  (3-56)  (3-57)  can  be  performed.   Note  that 
this  algorithm  corresponds  to  the  two-dimensional  version  of  the 
algorithm  proposed  by  White  [3] . 

If  we  make  the  approximation 

^(k*£)   =  -  y(k-r,£-s) 
9ars 


lVk,Z)      =   x(k-u,  y-v)  , 


TE 


uv 


then  it  follows  that 


VA[e(k,£)]  = 


r 


y(k,£-l) 
y(k,£-2) 


y(k-l,£) 
y(k-l,£-l) 


y(k-2,£) 


A   - 
=   Y 


y(k-M,£-N)  ) 
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and 


VB[e(kf£)]    = 


( 


x(k,£) 
x(k,A-l) 


x(k,A-q) 
x(k-l,£) 


^x(k-p,£-q) 


A   _ 
=    X 


Therefore,    in   this   case,  the  complete  algorithm  is  described  by 

A 
e(kT£)    =   e(j)    =   y(k,£)-d(k,£) 

A(j+1)    =   A(j)-2k   e(j)    Y 

d 

B(j+1)    =   B(j)    +2kbe(j)    X 

and  Y(k*£)  =  BTX  -  AY 
This  is  the  two-dimensional  version  of  the  algorithm  proposed 
by  Feintuck  [2] . 
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IV.   ADAPTIVE  NOISE  CANCELLER 

A.   THE  CONCEPT  OF  ADAPTIVE  NOISE  CANCELLING 

Noise  cancelling  is  a  variation  of  optimal  filtering  that 
is  highly  advantageous  in  many  applications.   Specially  in 
Wiener  filtering  or  Kalman  filtering,  which  are  optimal, 
apriori  knowledge  of  both  signal  and  noise  statistics  are 
required.   Adaptive  filters,  on  the  other  hand,  have  the 
ability  to  adjust  their  own  parameters  automatically,  and  their 
design  requires  little  or  no  apriori  knowledge  of  signal  and 
noise  statistics  while  the  Wiener  approach  utilizes  a  fixed 
parameter  filter  based  on  known  statistics. 

Figure  (4-1)  shows  the  basic  problem  and  the  adaptive 
noise  cancelling  solution  to  it.   It  makes  use  of  a  reference 
input  derived  from  one  (or  more)  sensors  located  at  the  points 
in  the  noise  field  where  the  signal  is  weak  or  undetectable. 
This  input  is  filtered  and  substracted  from  a  primary  input  con- 
taining both  signal  and  noise.   As  a  result  the  primary  noise 
is  attenuated  or  eliminated  by  cancellation. 

At  first  glance,  subtracting  noise  from  a  signal  seems  to 
be  a  dangerous  procedure.  If  done  improperly  it  could  result 
in  an  increase  in  output  noise  power.  If,  however,  filtering 
and  subtraction  are  controlled  by  an  appropriate  adaptive 
process,  noise  reduction  can  be  accomplished  with  little  risk 
of  distorting  the  signal  or  increasing  the  output  noise  level. 
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FIGURE  4-1.   THE  ADAPTIVE  NOISE  CANCELLING  CONCEPT 
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The  following  argument  for  the  above  is  mainly  due  to 

Wldrow,  et  al  [4] .  In  Figure  (4-1) ,  a  signal  S  is  transmitted 

over  a  channel  to  a  sensor  that  also  receives  a  noise  N. 

o 

A  second  sensor  receives  a  noise 
N,  uncorrelated  with  the  signal  but  correlated  in  some  un- 
known way  with  the  noise  N  .   In  addition  to  these  noises, 

additive  random  noises  M  and  M, uncorrelated  with  each  other 

o      1 

and  with  S,  N  and  N,are   present.     Then  the  reference  input 
is 

d  =  s+  n  +  yi  (4-1) 

o    o 

and  the  primary  input 

x  =  N1+  M±  (4_2) 

The  noise  N.+  M   is  filtered  to  produce  an  output  y  that  is  as 
close  a  replica  as  possible  of  N  +  M   .   This  output  is 
subtracted  from  the  reference  input  S  +  N   +  M  to  produce  the 
system  output 

z  =  S  +  N    +  M    -y 

o     o    ■* 

In  other  words,  the  practical  objective  of  the  noise  cancelling 
system  is  to  produce  a  system  output  z  =  S  +  N  +  M  -  y  that  is 
best  fit  in  the  least  square  sense  to  the  signal  S.   This  ob- 
jective is  accomplished  by  feeding  the  system  output  back  to  the 
adaptive  filter  and  adjusting  the  filter  through  the  LMS 
adaptive  algorithm  (described  in  previous  chapter)  to  minimize 
total  system  output  power.   Note  that  the  system  output  serves 
as  the  error  signal  for  the  adaptive  process. 
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Assume   for   the  moment  that  the  noises   M     and  M,    do   not  exist, 

o      1 

then  if  one  knew  the  characteristics  of  the  channels  over 
which  the  noise  N   is  transmitted  to  the  reference  input,  it 
would  be  possible  theoretically  to  design  a  fixed  filter  capable 
of  changing  N.  into  N  .  That  is,  if  the  correct  model  of  this 
transmission  channel,  H(z),  is  obtained,  the  adaptive  filter 
would  be  simply   1       ,         a  fixed  filter. 

HTiT 

Assume  that  S,  N  ,  N, ,  M  ,  M. ,  and  y  are  statistically 

stationary  and  have  zero  means.   Assume  that  S  is  uncorrelated 

with  N  and  N,  ard that  ^andM^  are  uncorrelated  with  each  other  and 

with  S,  N   and  N, ,  and  suppose  that  N   is  correlated  with  N  . 
o      i  1  o 

The  output   z    is 

z=S+N      +M      -y  (4-3) 

o  o 

squaring,  one  obtains 

z2=  S2+  (N   +  M   -  y) 2+  2S  (N   +  M   -  y) (4-4) 
o    o   J  o    o 

Taking  expectations  of  both  sides  and  realizing  that  S  is  un- 
correlated with  N   /    M  and      y,  yields 

o  o  ■*      J 

E[z2]    =   E[S2]    +   E[(N      +   M      -   y)2]    +  2E[S(    N      +   M      -   y)  ] 

o    o  o    o 

=  E[S2)  +  E[(NQ  +  Mq-  y  J2]  (4-5) 

2 

The  signal  power  E [S  ]  will  be  unaffected  as  the  filter  is  ad- 

2 

justed  to  minimize  E[z  ].   Accordingly,  the  minimum  output  power 

is 

min  E[z2]  =  E[S2]  +  min  E [ (N   +  MQ  -  y  ) 2]  (4-6) 

2 
Since  the  filter  is  adjusted  so  that  E(z  )  is  minimized,  therefore 

2 

E [ (NQ  +  MQ  -  y)  ]  is  minimized.   The  filter  output  y  is  then  a 
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best  least  squares  estimate  of  the  noise  N   +  M  .  Moreover, 

2  2 

when  E  [  (N  +  M  -y)  ]  is  minimized,  E[(z-S)  ]  is  also  mini- 
mized.  Since,  from  (4-3) 

z-S  =  MQ  +  NQ  -  y  (4-7) 

Adjusting  or  adapting  the  filter  to  minimize  the  total  output 
power  is  thus  equivalent  to  causing  the  output  z  to  be  a  best 
least  square  estimate  of  the  signal  S  for  a  given  structure  and 
adjustability  of  the  adaptive  filter  and  for  the  given  reference 
input.   The  output  z  will  contain  the  signal  S  plus  noise. 

From  (4-3) ,  the  output  noise  is  given  by  (N   +  M   -  y) .   Since 

2  2 

minimizing  the  E[z  ]  minimizes  the  E [N  +  M  -  y)  ] ,  minimizing 

the  total  output  power  minimizes  the  output  noise  power.   Since 
the  signal  in  the  output  remains  constant,  minimizing  the  total 
output  power  maximizes  the  output  signal  to  noise  ratio.   Note 
that  if  E[(N   -  y)  2]  =  0 can  be  achieved  ,  then  E[z2]  =  E(S2), 
therefore  y  =  N  +  M  and  z  =  S.   In  this  case,  minimizing  output 
power  causes  the  output  signal  to  be  perfectly  noise  free.   Also 
note  that,  on- the  other  hand,  when  the  reference  input  is  com- 
pletely uncorrelated  with  the  primary  input,  the  filter  will  "turn 
itself  off"  and  will  not  increase  output  noise. 

In  this  case,  the  filter  output  y  will  be  uncorrelated  with 
the  primary  input.   The  output  power  will  be 

E[z2]  -  E[(S+  MQ  +  NQ)2]  -  2E[y(s+NQ  +  M   )]  +  E[y2] 

=  E[(S  +  M   +  M  )2]  +  E[y2]  (4-7) 

o    o 


2 

Therefore,  minimizing  output  power  requires  that  E(y  ) 

be  minimized,  which  is  accomplished  by  making  all  weights 

2 
zero,  bringing  E [y  ]  to  zero. 

It  should  be  noted  that  in  applying  adaptive  techniques 
to  a  practical  systems  problem,  the  key  step  lies  in  providing  an 
appropriate  desired  response  signal  for  the  adaptation  process, 
that  is,  the  reference  input  should  be  provided  through  the  ap- 
propriate scheme,  while  the  exact  knowledge  of  statistical 
characteristics  are  not  required.   In  adaptive  modeling  applica- 
tions, the  desired  response  is  generally  available  as  the  output 
of  the  unknown  system  to  be  modeled.   And  also  in  the  noise 
cancelling  scheme  above,  the  reference  input  is  available  by 
sensing     noise  which  is  correlated  with  the  noise  at  the  primary 
input  in  some  manner. 

In  next  section,  the  signal  filtering  problem  is  discussed 
when  no  external  reference  input  free  of  signal  is  available. 

B.   NOISE  CANCELLING  WITHOUT  AN  EXTERNAL  REFERENCE  INPUT 

This  section     is  concerned  with  signal  filtering 
(estimation)    a     noise-corrupted  signal  when  no  external 
reference  input  is  available.   Here,  it  is  assumed  that  only  the 
noise -corrupted  signal  is  available,  that  is,  referring  to  the 
Figure  (4-1)  of  the  previous  section,  the  noise  free  of  .signal  N^ 
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which  is  correlated  with  N   that  corrupted  the  signal  S  is  not 
available;  only  S  +  N   is  available. 

It  is  proposed  to  estimate  the  signal  S  by  cancelling  the 
noise  N   in  some  adaptive  way.   In  the  following,  it  is  shown 
how  a  reference  input  can  be  obtained  for  the  adaptation  process 
under  certain  conditions.   Assume  that  the  noise  corrupted 
signal  x  =  S  +  N  is  composed  of  broad  band  noise  N  and  a  narrow 
band  signal  S,  then  the  autocorrelation  function  of  the  signal 
is  broad  and  that  of  the  noise  is  narrow.   Also  assume  that  noise 
N  is  uncorrelated  with  the  signal  and  that  the  mean  values  of 
both  signal  and  noise  are  zero. 

Consider  a  signal  delayed  by  6  units, 
x(j-6)  =  S  (j-6)    +  n  (j  -  6)  (4-9) 

where  6  is  a  sufficient  number  of  time  units  so  that  the  noise 
component  is  decorrelated,  but  the  signal  component  still 
remains  correlated. 
Then 

E[n(j)  n  (j-6)  ]  =  0 

E(S(j)  S  (j-6)  ]  ?    0,  finite  (4-10) 

For  the  two-dimensional  signal,  a  signal  delayed  by  &-.,    6? 
units  in  the  horizontal  and  vertical  direction  respectively, 
where  6^  and  62  are  sufficient  length  of  spatial  units  such  that 
the  noise  field  would  be  decorrelated  but  the  signal  field  still 
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remains  correlated,  then 

E[N(k,£)  N  (k-61,A-62)]  =  0 

E[S(k,£)  S  (k-6ltl-62)]    f    0,  finite        (4-11) 
and  again  it  is  assumed  the  signal  field  and  noise  field  are  not 
correlated  with  each  other. 

Now  if  this  delayed  signal  is  used  as  a  primary  input  and 
the  original  input  used  as  a  reference  input  to  the  adaptive  filter, 
then  referring  to     Figure  (4-1)  of  previous  section, 
S(j-6)  or  S(k-6n,&  -^-y)    can  be  considered  as  N-^in  Figure  (4-1) 
and  N(j-6)  or  n(k-<5-i,fl  -  62)  as  M^,  and  S(j)  or  S(k,£)  can  be 
considered  as  N  and  N(j)  or  N(k,&)  as  M  in  Figure  (4-2),  re-  • 
spectively . 

From  equation  (4-10)  and  the  assumptions  that  the  signal  and  noise 
are  uncorrelated,  it  is  seen  that  the  assumptions  made  in  the 
last  section  for  the  various  signals  holds  here. 

Therefore,  from  the  argument  in      section   IV-A, ,  the 
filter  output  would  be  a  good  estimate  of  the  signal  S  .  Figure 
4-2  shows  the  noise  cancelling  (or  signal  estimation)  scheme 
xiscussed  above. 
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V.   EXPERIMENT  AND  RESULTS 

In  this  chapter,  a  computer  experiment  is  performed  to 
check  the  feasibility  of  the  algorithms  derived  in  Chapter 
HT  for  certain  applications.   The  signal  estimation  problem 
for  a  noise  corrupted  signal  is  treated  here  for  both  one- 
dimensional  and  two-dimensional  cases.   Nonrecursive  adaptive 
filtering  and  recursive  filtering  have  been  examined  and  the  perfor- 
mances of  adaptive  filters  are  compared  to  that  of  the  optimal 
Wiener  solution.   The  adaptive  noise  cancelling  scheme  is 
used  for  this  application. 

First,  consider  a  band  limited  one-dimensional  signal  S 
corrupted  by  noise  N;     it  is  desired  to  estimate  the  signal. 
If  the  statistics  of  both  signal  and  noise  are  known  apriori, 
a  fixed  optimal  filter  to  estimate  the  signal  can  be  designed 
by  the  Wiener  Hopf  solution  of  equation  (3-11) . 

Here  it  is  assumed  that  these  statistics  are  not  known 
apriori  but         only  that  the  signal  is  narrowband  and 
the  noise  is  a  broad  band  signal  and  the  signal  is  entirely 
uncorrelated  with  the  noise.   Then  the  signal  has  a  wide  cor- 
relation function  while  the  noise  has  a  narrow  correlation 
function.   Separation  of  this  broadband  noise  and  narrowband 
signal  is  now  required  for  the  estimation  of  the  signal. 

It  is  assumed  further  that  thedesired  (or  reference)  signal 
which  is  needed  for  the  adaptive  process  is  not  available,  that 
is,  no  other  possible  reference  signal  is"  available  which  may  have  some 


correlation  with  the  signal  we  want  to  estimate  . 

This   problem  can  be  considered  as  an  adaptive  noise 
cancelling  problem  without  reference  input.   Assuming  that  the 
noise  is  white,  then  from  the  Figure  (4-2)  one  unit  delay  is 
enough  to  decorrelate  the  noise  component  appearing  in  the  adap- 
tive filter  input  from  the  noise  aonponent  in  the  desired  signal.   These 
components  will  thus  appear  in  the  error  but  not  in  the  filter 
output.   The  narrowband  component,  on  the  other  hand,  will  not 
be  decorrelated  by  the  delay  and  will  appear  in  the  adaptive 
filter  output. 

The  input  signal  would  be 
x(j)  =  S(j)  +  N(j) 
where  S(j)  bandlimited  signal 
N(j)  white  noise 
and  the  reference  input  would  be 
d(j)  =  S(j-l)  +  N(j-l)  . 
The  form  of  autocorrelation  function  of  the  signal  is  assumed  as 
R(m)  =  p 'm'cos  w0  m 

For  the  purpose  of  computer  simulation,  the  following  values 
are  assigned: 

p  =  0.95 

w0=  0.025 

and  the  variance  of  noise  is  0.5. 
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For  the  optimum  filter  design  using  the  above values ,a transversal 
filter  having  10  delays  was  used.   From  equation  (3-11). 


W         -1 
LMS  =  R    x  R  , 
xx    xd 


The  autocorrelation  matrix  R   was  computed  as 
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and  the  crosscorr elation  matrix   R    ,    as 
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Then  the  optimum  Wiener  Hopf  solution  gives  the  filter  co- 
efficients as 
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For  simulation  of  the  bandlimited  signal,  the  state  and  out- 
put equations  of  example  1)  in  Section  2  of  Chapter  II  are 
used. 

For  the  nonrecursive  adaptive  filter  application,  again 
10  delays  and  =  -0.005  as  a  step  size  in  LMS  algorithm  were 
used,  and  2  delays  for  both  feedforward  and  feedback  path  and 
=  -0.001  for  ka  ,  kb  (equation (3-42)  ),  were  used  in  the  recur- 
sive filter  application  of  both  Feintuck's  algorithm  and  the 
algorithm  developed  here.   Eight  points  were  used  for  error 
square  averaging  for  the  gradient  estimation  (L=8,  in  Equation 
(3-40)  ) .   The  experimental  results  are  plotted  in  the  following 
along  with  the  descriptions  and  optimal  solution  for  the  purpose 
of  comparison.   The  results  indicate  the  adaptive  recursive 
filter  appears  to  perform  as  well  as  the  optimal  Wiener  filter 
once  it  reaches  a  steady  state  condition. 
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FIGURE  R-l     SIGNAL  STATISTICS 
Autocorrelation  =0.96  m  cos (0.025  m) 
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FIGURE  R-2     NOISE  CORRUPTED  SIGNAL 
NOISE:   WHITE:   Zero  Mean 

:   Variance  =  0.5 
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FIGURE  R-3   WIENER-HOPF  FILTERING 


10  Delays  are  used. 
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FIGURE  R-4 


WIDROW'S  NONRECURSIVE  ADAPTIVE 

FILTERING 

Number  of  delays  used:   10 

Stepsize  Used:   -0.005 
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FIGURE  R-5 


FILTERING  BY  FEINTUCK'S  ALGORITHM 
NUMBER  OF  DELAYS  IN  FEED  FORWARD  PATH:  2 

IN  FEED  BACK  PATH    :  2 
STEPSIZE  USED:   -0.001 
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FIGURE  R-6     FILTERING  BY  THE  ALGORITHM  USING  A 

FINITE  POINT  MOVING  SQUARE  ERROR  AVERAGE 
For  the  estimation  of  gradient 
Number  of  Delays  in  Feed  Forward  Path:   2 

in  Feed  Back  Path    :   2 
Stepsize  Used:   -0.001 
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As  a  second  application/  consider  an  image  sensed  by  an  image 
sensing  device  of  anNxN  sensing  elements  array.   It  is  assumed 
that  this  image  is  composed  of  correlated  background  and  a 
three  diagonal  line  target  trajectory.       This  image  may 
be  interfered  withby  the  internal  noise  of  device  (assumed  white)  . 
Then  the  output    image   includes  three  types  of  processes: 

x(k,£)  =  S(k,£)  +  T(k,£)  +  w(k,£) 
where 

S(k,JL)  =  correlated  background 

T(k,£)  =  Target  strength  (three  diagonal  line) 

W(k,£)  =  noise. 
Again  it  is  assumed  that  no  statistics  are  known  apriori 
and  the  correlated  background  is  a  narrowband  signal.   It  is 
further  assumed  that thecorr elated  background  and  noise  are 
uncorrelated  with  each  other.   It  is  proposed  to  separate 
the  three  diagonal  lines  from  the  background  noise.   Again, the 
same  argument  holds  that  this  problem  is  a  two-dimensional   noise- 
cancelling  problem  in  which  no  reference  is  available.   It  is 
further  assumed  that  the  correlated  background  is  a  band  pass 
process  for  which  the  autocorrelation  function  is 

RSB(m,n)  =  p  ri    pv    >cos  w^n  cos  w^n 

where  Pv,  Py  rePresent  horizontal  and  vertical  direction  cor- 
relation coefficients  respectively. 
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Going  back  to  Figure  (4-2)  in  the  previous  section,  the  delay 
z,    z~   will  be  sufficient  to  decorrelate  the  white  noise 
and  diagonal  line  target.   Then  as  a  result  of  filtering,  the 
system  output  (or  the  residual  field)  will  be  the  desired  sig 
nal.   It  should  be  noted  that  this  residual  signal  is  com- 
posed of  an  estimate   of  three  diagonal  lines  and  white 
noise  as  well  as  seme  granularity  due  to  the  fixed  stepsize[l]. 

The  problem  of  enhancing  the  target  diagonal  line  which 
is  subjected  to  the  noise  (white  noise  and  adaptation  noise) 
is  another  problem  of  interest.   It  will  not  be  considered 
in  this  work. 

For  the  purpose  of  simulation,  the  following  values 
were  used: 

1)  p   =  p,  =   0.96    w,=  w   =  0.143 
J      v    Kh  h   v 

2)  p   =  p,  =   0.99    wu=  w   =  0.143 

J      v  h  h   v 

The  variance  of  correlated  background  =  1.0 

White  noise  variance  =0.1 

Target  diagonal  line  intensity        =  1.8 

For  the  optimal  filter  design,  using  above  values  for  p  = 

p,  =  0.96,  the  Wiener-Hopf  solution  of  Equation  (3-30)is: 

ffLMS  "  Rxx-1  Rxd 
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where  WLHS;    Rxx,    Rxdare    defined  by  equations  (3-22),     (3-28), 

(3-29)}   respectively. 

Using  p=3,    q  =    3    in   equation    (3-20), 
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Only  the  recursive  simulation  is  performed  and  compared  with 
the  nonrecursive  Wiener  filter.   The  simulation  results  are 
shown  on  the  following  pages  and  indicate  that  although  the 
optimal  performance  of  the  Wiener  filter  is  not  achieved, 
the  adaptive  recursive  filter  performs  well. 
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FIGURE    R-7      CORRELATED   BACKGROUND   +    THREE    STREAKS    + 

WHITE    NOISE 
Background:    R(m,n)    =    0  .  9^m' 0.  96^'    cos(0.143   m) cos ( 0. 143n) 

Variance   =1.0 
White   noise:      zero  mean 

variance  =    0.1 
Streaks    intensity:      1.8 
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FIGURE    R-9      RESIDUAL   SIGNAL   AFTER   RECURSIVE    FILTERING 
White  Noise   +   Streaks   +  Adaptation  Noise 
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FIGURE    R-10   Backgrounds   +  White   noise   +   Streaks 

Background:    R(m,n)    =    0.99  m   0.99   n   cos (0 . 143m) cos (0.143n) 

Variance  =   1.0 
White   noise:      zero  mean 

Variance  =    0.1 
Streaks    Intensity:    1.8 
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RESIDUAL    SIGNAL   AFTER   WIENER    FILTERING 
Figure   R-10 
White  Noise   +   Streaks 
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VI.   CONCLUSIONS 

The  study  described  herein  has  developed  a  new  algorithm 
for  the  one-dimensional  signal  filtering  problem  and  extends 
this  to  two-dimensional  processing.   It  is  an  adaptive  recur- 
sive filtering  algorithm  based  on  the  steepest  descent  gradient 
method  which  employs  the  finite  point  square  error  for  the 

* 

gradient  estimation  rather  than  instantaneous  square  error. 
A  simplified  two-dimensional  version  of  this  algorithm 
is  developed.   It  is  designed  to  estimate  the  signal  in  real- 
time operation  in  cases  where  the  statistics  of  both  signal 
and  corrupting  noise  are  not  available  apriori.   The  algorithm 
learns  the  statistics  and  adapts  even  though  it  is  not  optimal, 
which  means  that  it  seeks  the  minimum  of  the  error  criterion. 
It  should  be  noted  that  Widrow's  nonrecursive  adaptive  filtering 
algorithm  gives  the  global  minimum  of  performance  criterion  due 
to  the  fact  that  for  the  stationary  process,  the  mean  square 
error  is  the  quadratic  form  of  weight  vectors,  but  for  the  re- 
cursive adaptive  filter,  local  minima  may  be  found  instead  of  the 
global  minimum.  The  computer  simulation  shows  that  for  the  examples 
considered  here  the  algorithms  presented  learn  the 
statistics  of  signal  and  adapt.    Several  points  can  be  observed 
through  the  experimental  results  [see  Figure  R-l  through  R-12] . 
1)   All  the  algorithms  presented  here  give  a  satisfactory 

result  after  the  transients  die  out  even  though  they  are 

not  optimal. 
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2)  The  algorithm  which  employs  the  finite  point  average 
square  error  for  the  gradient  estimates  gives   more 
rapid  convergence  than     Feintuck's  algorithm.   The 
possible  reasons  may  be  due  to  the  fact  that  the  output 
information  is  fed   back   and  used  for  the  filter  coef- 
ficients updating  process  and  the  sensitivity  information 
propagates  through  the  recursive  equation  as  the  iteration 
proceeds,  while  Feintuck's  algorithm  discards  the  sensi- 
tivity information. 

3)  The  algorithm  developed  here  gives  the  best  results  among 
the  various  algorithms  presented  at  the  expense  of 
complex  hardware.   Note  that  the  lEquired  number  of  addi- 
tional sensitivity  filters  (equations  (3-47),  (3-48)) 
would  be  the  number  of  filter  coefficients,  and  due  to  the 
L  point  averaging  process,  additional  storage  elements  are 
also  needed.    The  possible  reason  for  the   good  results 
may  be  due  to  the  fact  that  the  averaging  process  [equation 

(3-4  0)]  for  the  gradient  estimate  gives   a   smaller 

error  between  true  gradient  and  estimated 

gradients  than  the  gradient  estimate  based  on  instantaneous 

square  error  does} while  both  give     unbiased  gradient 

estimates . 

Due  to  the  emerging  interest  in     adaptive  recursive 
filters,    further  research  on  this  subject  may  be  worthwhile. 
The  following  are  left  open  for  further  research; 
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1)  Comparison  of  steady  state  performance  of  the  recur- 
sive adaptive  filter  with  the  Kalman  filtering  tech- 
nique.  It  would  lead  to  a  better  understanding  of  the 
performance  of  the  recursive  filter  to  express  the 
filter  coefficients  9    in  terms  of  steady-state  Kalman 
filter  gains. 

2)  Mathematical  derivation  of  the  bound  in  step  size  of 
the  filter  coefficient  updating  process  for  convergence 
and  stability.   It  is  believed  that  this  bound  may  be  at 
tainedby setting  up  the  constraints  first  such  that  the 
value  of  performance  criterion  decreases  mono- 
tomically  to  a  minimum  as  the  iteration  progresses. 


3 )  Modification  of  the  algorithms  for  the  case  that  .  partial 
statistics  of  signal  or  noise  are  available  apriori. 

4 )  Derivation  of  the  algorithm  based  on  a   different  per- 
formance criterion  such  as  maximum  likelihood  ratio, 
maximum  signal  to  noise  ratio,  etc. 

5)  Derivation  of  the  algorithm  based  on  the  different  mini- 
mization techniques  such  as  Newton's  method  or  Fletcher- 
Powell  methods,  etc.,  for  a  given  performance  criterion. 
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